This is a hands-on introduction to Sparkling Water, the H2O platform that combines our fast, scalable machine learning algorithms with the capabilities of Spark.
In this introduction, we show you how to start and connect to the sparkling Water cluster, load data, perform data wrangling tasks at big-data scale, build and evaluate predictive models (including GLM and GBM/XGBoost algorithms), and show how to export these steps into a Spark pipeline for production. We finish with a quick introduction to the Automatic Machine Learning (AutoML) functionality provided by H2O.
By the end of this training, the attendee will be able to
- Start and connect to the Sparkling Water cluster
- Load data into Sparkling Water
- Inspect data using H2O Flow
- Perform basic data munging tasks with either H2O or Spark commands
- Fit one or more of GLM, GBM, and XGBoost models
- Create a Spark production pipeline
- Build multiple competing models using AutoML
This course assumes
- Some familiarity with statistical or machine learning models
- A basic understanding of Python