Feature selection is the process of selection a subset of features most relevant from a given set of features for a supervised machine learning problem. There are many techniques for feature selection. in this post we will use 4 information theory based feature selection algorithms. This post is not about feature engineering which is construction of new features from a given set of features. The implementation is available in the daexp module of my python package matumizi. The GitHub repo is whakapai.

Information Theory Based Techniques

The module called daexp in matumizi contains close to 100 Exploratory Data Analysis (EDA) functions. There are many feature selection methods in that module. Our focus is on the following 4 information theory based feature selection algorithms.

  • Max Relevance Min redundancy (MRMR)
  • Joint Mutual Information (JMI)
  • Conditional Mutual Information Maximization (CMIM)
  • Interaction Capping (ICAP)

These information theoretic algorithms assign a score to each feature based on it’s relevance to the target variable and how redundant the feature is wrt other features selected so far. The algorithms select features with maximum relevance to the target and minimum redundancy with other features. Features are selected one at a time until the desired no of features are selected. Ideally, oo select n features from m features, all possible combination of the n features need to considered, which is computationally expensive. Instead, these algorithms are sequential and greedy in nature. For MRMR, the algorithm is as follows. For the other 3 the only difference is how the redundancy with other features is computed.

Input: feature data for all features, target data, desired no of features
Output: selected features

For each of the desired no of features
  For each feature not selected so far
    Calculate relevance i.e mutual information with the target variables (A)
    Calculate mutual information with each of the features already selected.
    Take the average of values from the previous step (B)
    Assign score S = A - B
  Select the feature with the highest score and add to the selected list of features

The other 3 algorithms differ in the quantity B is calculated. Please refer to the citation for details. The algorithms work for any combination numerical and categorical features and target as follows

  • Numerical feature, categorical target
  • Categorical feature, categorical target
  • Numerical feature, numerical target
  • Categorical feature, numerical target

Feature Selection in Mortgage Loan Data

The data set is artificially created. It has mixture of 14 numerical and categorical features. the fields are as follows

  • Loan ID (not feature)
  • Marital status
  • No of children
  • Education level
  • Whether self employed
  • Income
  • Years of experience
  • No of years in current job
  • Debt amount
  • Loan amount
  • Loan term
  • Credit score
  • Bank account balance
  • Retirement account balance
  • No of prior mortgage loans
  • Approved or not (target)

To use the data explorer module, the following steps are necessary as you can see in the driver code as an example.

  • Create instance of DataExplorer
  • Register data sets for all features and target
  • Call the feature selection function

Here is some sample output. I called MRMR and JMI feature selection functions asking for 3 features to be selected.

{'selFeatures': [('income', 0.3337975824757886), ('education', -0.028132114867564062), ('selfemp', 0.003771770973891475)]}
{'selFeatures': [('income', 0.3337975824757886), ('crscore', 0.014392295006549372), ('education', -0.011154126726672847)]}

The 2 algorithms agree on 2 out of 3 features. The output contains selected feature names along with scores. Please refer to the tutorial doc for details.

Wrapping Up

The module daexp contains close to 100 EDA function, mostly based on existing Python libraries. We have only covered some feature selection algorithms.