Data Preparation for Machine Learning. Why Feature Engineering Remains a Human-Driven Activity

This session will expose analytic practitioners, data scientists, and those looking to get started in predictive analytics to the critical importance of properly preparing data in advance of model building. The instructor will present the critical role of feature engineering, explaining both what it is and how to do it effectively. Emphasis will be given to those tasks that must be overseen by the modeler – and cannot be performed without the context of a specific modeling project. Data is carefully “crafted” by the modeler to improve the ability of modeling algorithms to find patterns of interest.

Data preparation is often associated with cleaning and formatting the data. While important, these tasks will not be our focus. Rather it is how the human modeler creates a dataset that is uniquely suited to the business problem.

You will learn:

  • Construction methods for various data transformations
  • The merits and limitations of automated data preparation technologies
  • Which data prep tasks are best performed by data scientist, and which by IT
  • Common types of constructed variables and why they are useful
  • How to effectively utilize subject matter experts during data preparation