Fine Tuning of an Ensemble of Classiﬁcation Models Using Random Grid Search
Parameters for a model are learned during the training while hyperparameters are set to control the implementation of the model. Grid-search is a technique used to ﬁnd the optimal hyperparameters for a model. Ensemble learning involves training multiple models and combining the diverse classiﬁers together to form a strong machine learning learner. It helps to improve robustness over a single learner and handles large volumes of data or not adequate data (Yao et al., 2018). The technique is applied by certain papers (Kruthika et al., 2019a), (Zhang and Sejdi´c, 2019). The goal of this implementation is to achieve better performance by running a grid search and the use of an ensemble of classiﬁers. This is a continuation from here.
Run Grid Search to Find Most Acceptable Hyperparameters
Hyperparameter values need to be set before the learning process because the values are used to control the learning process and cannot be estimated from the data. A combination of values is used on a validation data set to ﬁnd the optimal hyperparameters. Random grid search uses a grid of hyperparameters and random combination to train and score on the validation data set and not a test data set. This helps to generalize performance. Running random grid search (RandomizedSearchCV) several times using cross-validation helps to ﬁnd the most acceptable parameters for the model. These hyperparameters are used for the model are stated and used for an ensemble of classiﬁers.
Generate Feature Importance of the Classiﬁers
The figure shows the feature importance of diﬀerent classiﬁers.
It shows that in a given model the features which are important in explaining the target feature. MRI of entorhinal is the most important feature followed by MMSE for all the classiﬁers except Ada boost. RAVLT immediate is the next important feature for Ada boost. However, the presence of continuous features or high-cardinality categorical features can result in a bias.
Implementation, Evaluation and Result of Ensemble of Classiﬁers
Ensemble learning generally improves the performance of the models (Nanni et al., 2016). Random Forest , Extra trees classiﬁer , Ada boost classiﬁer , Gradient boosting classiﬁer , and XGBoost with optimized hyperparameters are combined using voting classiﬁer. Voting classiﬁer combines the above models using soft voting which is:
It is used to predict the class labels based on the predicted probabilities for well-calibrated classiﬁer. wj is the weight that can be assigned to the jth classiﬁer. It is implemented using scikit-learn library and function used to implement is VotingClassiﬁer() and weights for the model are 2, 3, 3, 1 and 3.
Figure is a normalized confusion matrix for the ensemble of classiﬁers.
The values of the diagonal elements denote the degree of correctly predicted class i.e., 0.47 for normal (NL), 0.60 for MCI and 0.99 for dementia. The oﬀ-diagonal elements are mistakenly confused with the other classes. Therefore, the model is better for predicting dementia and MCI than normal with the threshold of the ensemble of classiﬁers ﬁxed at 0.5.
The model resulted in predicting normal with AUROC score of 0.72 against dementia and MCI, MCI with AUROC score of 0.60 against normal and dementia and classify dementia with AUROC score of 0.89 against normal and MCI.
AUROC curve helps to measure the performance of the model without ﬁxing the threshold. It plots a point for every possible threshold and is helpful to select the threshold of the model depending on the use case. The figure shows that the ensemble of classiﬁers is better in predicting dementia than normal and MCI when the threshold is not ﬁxed. Hence, the model is better than the implementation of XGboost in the previous section.
The report continues here.