Understand the problem
- What is the problem you’re trying to solve?
- What is the goal? Is the goal a business objective?
- If so, the goal likely needs to be reframed or broken down into a machine learning problem.
- Is it a generative, compression, regression, or classification problem?
Data preparation
- What data sources are you working with? i.e., relational, column, graph, or document.
- Is the data unstructured or structured?
- Do we need to collect more data?
- Would acquiring additional third-party data sources benefit the project?
Featuring Engineering
- How will we handle missing data? i.e., impute values using mean, mode or median, training a model to impute missing values.
- What are the risks of imputing values?
- How will we standardize or normalize variables?
- How will we handle skewed variables? i.e., log scaling
Model selection and evaluation
- Start with a simple model as a baseline
- What is the bias-variance trade-off?
- How will we evaluate the model?