EazyPredict - Running and comparing multiple ML models at once

Welcome to the world of ‘EazyPredict’, a Python module that aims to make trying out multiple prediction algorithms as simple and efficient as possible. The module was heavily influenced by the ‘LazyPredict’ module. I developed this module to address a few shortcomings I identified in LazyPredict.

Why EazyPredict?

Some of its key features are as follows -

The ‘EazyPredict’ module utilizes a limited number of prediction algorithms (10) in order to minimize memory usage and prevent potential issues on platforms such as Kaggle.
Users have the option to input a custom list of prediction algorithms (as demonstrated in the example provided) in order to perform personalized comparisons with estimators of their choosing.
The models can be saved to an output folder at the user’s discretion and are returned as a dictionary, allowing for easy addition of custom hyperparameters.
The top N models can be selected to create an ensemble using a voting classifier.

Using it for classification

Let’s try it on this introductory problem on kaggle.

As written on kaggle -

“This is the legendary Titanic ML competition – the best, first challenge for you to dive into ML competitions and familiarize yourself with how the Kaggle platform works. The competition is simple: use machine learning to create a model that predicts which passengers survived the Titanic shipwreck.”

First, we need to load the dataset:

df = pd.read_csv("data/train.csv")
df.head()

	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Ticket	Fare	Cabin	Embarked
0	1	0	3	Braund, Mr. Owen Harris	male	22.0	1	A/5 21171	7.2500	NaN	S
1	2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38.0	1	PC 17599	71.2833	C85	C
2	3	1	3	Heikkinen, Miss. Laina	female	26.0	0	STON/O2. 3101282	7.9250	NaN	S
3	4	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35.0	1	113803	53.1000	C123	S
4	5	0	3	Allen, Mr. William Henry	male	35.0	0	373450	8.0500	NaN	S

So before using eazypredict, we need to pre-process the dataset. This includes the following steps -

Removing null values
Encoding categorical data
Scaling the dataset
Splitting the training and testing data

# Removes null values
df["Age"].fillna(method="bfill", inplace=True)
df["Cabin"].fillna("No Room", inplace=True)
df["Embarked"].fillna("S", inplace=True)

# Encodes categorical data
ord_enc = OrdinalEncoder()

df["Sex_code"] = ord_enc.fit_transform(df[["Sex"]])
df["Cabin_code"] = ord_enc.fit_transform(df[["Cabin"]])
df["Embarked_code"] = ord_enc.fit_transform(df[["Embarked"]])

# Selects features for X and labels for y
X_feat = [
    "Pclass",
    "Age",
    "SibSp",
    "Parch",
    "Fare",
    "Sex_code",
    "Cabin_code",
    "Embarked_code",
]
y_feat = ["Survived"]

X = df[X_feat]
y = df[y_feat]

# Scaling the features
scaler = RobustScaler()
X_norm = pd.DataFrame(scaler.fit_transform(X), columns=X.columns)

# Splitting into train, set 
X_train, X_test, y_train, y_test = train_test_split(
    X_norm, y, test_size=0.33, random_state=42
)

X_norm.head()

	Pclass	Age	SibSp	Fare	Sex_code	Cabin_code	Embarked_code
0	0.0	-0.388889	1.0	-0.312011	0.0	0.0	0.0
1	-2.0	0.500000	1.0	2.461242	-1.0	-65.0	-2.0
2	0.0	-0.166667	0.0	-0.282777	-1.0	0.0	0.0
3	-2.0	0.333333	1.0	1.673732	-1.0	-91.0	0.0
4	0.0	0.333333	0.0	-0.277363	0.0	0.0	0.0

Now we can use eazypredict module to quicly get the predictions of the top classification algorithms.

clf = EazyClassifier()
model_list, prediction_list, model_results = clf.fit(X_train, X_test, 
                                                     y_train, y_test)

model_results

100%|██████████| 10/10 [00:00<00:00, 10.09it/s]

	Accuracy	f1 score	ROC AUC score
GaussianNB	0.803390	0.803637	0.797619
MLPClassifier	0.803390	0.800228	0.784524
RandomForestClassifier	0.800000	0.798956	0.788214
LGBMClassifier	0.800000	0.798244	0.785595
RidgeClassifier	0.796610	0.794629	0.781429
XGBClassifier	0.779661	0.779203	0.769762
DecisionTreeClassifier	0.779661	0.778869	0.768452
KNeighborsClassifier	0.769492	0.766785	0.752024
SVC	0.688136	0.662186	0.640238
SGDClassifier	0.681356	0.669167	0.647619

After this you have the ability to select any model and perform hyperparameter tuning on it.

gaussian_clf = model_list["GaussianNB"]

from sklearn.model_selection import GridSearchCV

params_NB = {"var_smoothing": np.logspace(0, -9, num=100)}
gs_NB = GridSearchCV(
    estimator=gaussian_clf, param_grid=params_NB, verbose=1, scoring="accuracy"
)

gs_NB.fit(X_train, y_train.values.ravel())

gs_NB.best_params_

Fitting 5 folds for each of 100 candidates, totalling 500 fits

{'var_smoothing': 8.111308307896873e-06}

Using it for regression

It can be used for regression in pretty much the same way as above. You just need to import the EazyRegressor estimator.

More details can be found here.

Creating an ensemble model

This is the most effective feature of this library as an ensemble model can create a really good model with minimal effort in hyper parameter tuning.

All you need to do is to pass the results and the model names from the previous “fit” step to the next one.

clf = EazyClassifier()

model_list, prediction_list, model_results = clf.fit(X_train, X_test, y_train, y_test)

ensemble_reg, ensemble_results = clf.fitVotingEnsemble(model_list, model_results)
ensemble_results

100%|██████████| 10/10 [00:01<00:00, 6.68it/s]

	Models	Accuracy	F1 score	ROC AUC score
0	GaussianNB LGBMClassifier RidgeClassifier MLPC...	0.816949	0.758929	0.799881

Conclusion

In conclusion, ‘EazyPredict’ is an efficient and user-friendly Python module that makes trying out multiple prediction algorithms a breeze. Its memory-efficient design and customizable options make it a valuable tool for any data scientist or machine learning enthusiast. I hope you enjoy using ‘EazyPredict’ as much as I enjoyed creating it.

Check out the entire project on Github or PyPI.

EazyPredict ML module

EazyPredict - Running and comparing multiple ML models at once

Why EazyPredict?

Using it for classification

Using it for regression

Creating an ensemble model

Conclusion

Developing a chess plugin …

kNN algorithm explained …

DIY Cell Phone Detector …