How to Optimize Your Einstein Discovery Model: Modified Stepwise Feature Selection in Einstein

Updated: Sep 27, 2022

Table of Contents:

1. Stepwise Feature Selection Defined

Stepwise Feature Selection Defined

Stepwise feature selection is a method to determine which features are most important to our model, allowing us to build the model with the least number of features. This method helps to:

Improve accuracy of the calculated coefficients, and therefore the accuracy of predictions.
Reduce overfitting, which can lead to poor results when deployed.
Ensure Einstein's internal processing and model parameter tuning are accounted for.

The method described here is not intended for primary feature selection. But once we've reduced our features to be less than fifty, we can use the following procedure to optimize our model within Einstein. It's also preferred that multicollinearity is already removed from among the features.

There are two main types of stepwise feature selection, forward and backward. Forward feature selection starts with no features and adds one feature at a time. If the model metric improves (for example, the R2 in Einstein), we keep it in. If not, we remove it. Backward feature selection is similar, but it starts with all features and removes them one at a time. We suggest using statistical significance with the outcome variable, although correlation can also be used (For more information about statistical significance, click here). In forward selection, add the variable with the highest statistical significance first. In backward selection, remove the one with the lowest statistical significance.

The problem with backward selection is that, in general, the more features we have in the model, the higher the R2. So we may reach a point where we stop yet still have more features in our model than we need to get a comparable R2. The forward selection method doesn't include any interaction terms with features added later. Furthermore, Einstein doesn't give us statistical significance values. So what do we do?

How to Do This in Einstein

Outside resources can be used to calculate statistical significance. But if this is not available to you, you can create a model with all the features and look at the "What Happened" & "Why it Happened" tabs. The results are ordered by decreasing statistical significance using a T-test. However, they will most often include interaction terms. If you see two features listed in one insight, these two features interact to give a statistically significant term. So consider adding or removing them one after the other. If you see one feature listed in several results, each with different interaction terms, record this one as more important to the model.

This can give you a good starting point, but you may have more features than Einstein shows insights. If that happens, we can use the correlation for the rest, shown in the "What Could Happen" tab and the "Edit Story" page. When adding features, we use the highest correlation. When removing features, we use the lowest correlation. You could also use correlation for the whole process, but using statistical significance is more likely to include features that have a causal relationship with the outcome variable, which is more meaningful. Keep in mind that statistical significance and correlation are just used as a guide. We're focusing on optimizing our model metric, R2. Ok, that addresses the statistical significance. What about having too many features in our model or not including interaction terms?

We came up with a modification that allows us to reduce features using forward selection, have a better idea of each feature's impact on the R2 with more interaction terms, and do it within the Salesforce org's story creation limits (100 per day, 500 per month, with the option to purchase more).

First, start with all features and take note of the R2. Using statistical significance or correlation as a guide, remove one feature and take note of the R2, whether it increased or decreased, and by how much. An estimate of the change in R2 is sufficient. Whether the R2 increases or decreases, keep that feature out of the model and remove the next one. Take note of the R2 and repeat the process until you're left with two input variables (because Einstein won't let you create a story with any less) or however many must be included from a business understanding, like actionable variables.

Then go through your list of changes in R2 and add back the feature that caused it to decrease the most when it was removed. If the R2 increases, keep it in. If the R2 decreases, remove it. Repeat this until you've added all the features that caused a decrease in R2 when they were first removed. Then, go back and find the highest value of R2 that occurred in the process. It may be one of the first few values of the modified backward selection. The current value of R2 may be comparable to this, yet with fewer features. If not, we can add features that increased the R2 least when they were removed until we reach a satisfactory R2 (the metric may improve due to different interaction terms). We hope this method helps you deploy the best possible model.

References:

Stephanie Glen. "Stepwise Regression" From StatisticsHowTo.com: Elementary Statistics for the rest of us! https://www.statisticshowto.com/stepwise-regression/

Ravindra Savaram. "Stepwise Regression" From MindMajix.com: https://mindmajix.com/stepwise-regression

Wikipedia contributors. (2020, April 7). Stepwise regression. In Wikipedia, The Free Encyclopedia. Retrieved 13:47, February 22, 2021, from https://en.wikipedia.org/w/index.php?title=Stepwise_regression&oldid=949614867

Wikipedia contributors. (2021, January 2). Feature selection. In Wikipedia, The Free Encyclopedia. Retrieved 13:50, February 22, 2021, from https://en.wikipedia.org/w/index.php?title=Feature_selection&oldid=997859344

How to Optimize Your Einstein Discovery Model: Modified Stepwise Feature Selection in Einstein

1 Comment