Feature Selection with Information Theory Based Techniques in Python

Feature selection is the process of selection a subset of features most relevant from a given set of features for a supervised machine learning problem. There are many techniques for feature selection. in this post we will use 4 information theory based feature selection algorithms. This post is not about feature engineering which is construction of new features from a given set of features. The implementation is available in the daexp module of my python package matumizi. The GitHub repo is whakapai.

Information Theory Based Techniques

The module called daexp in matumizi contains close to 100 Exploratory Data Analysis (EDA) functions. There are many feature selection methods in that module. Our focus is on the following 4 information theory based feature selection algorithms.

Max Relevance Min redundancy (MRMR)
Joint Mutual Information (JMI)
Conditional Mutual Information Maximization (CMIM)
Interaction Capping (ICAP)

These information theoretic algorithms assign a score to each feature based on it’s relevance to the target variable and how redundant the feature is wrt other features selected so far. The algorithms select features with maximum relevance to the target and minimum redundancy with other features. Features are selected one at a time until the desired no of features are selected. Ideally, oo select n features from m features, all possible combination of the n features need to considered, which is computationally expensive. Instead, these algorithms are sequential and greedy in nature. For MRMR, the algorithm is as follows. For the other 3 the only difference is how the redundancy with other features is computed.

Input: feature data for all features, target data, desired no of features
Output: selected features

For each of the desired no of features
  For each feature not selected so far
    Calculate relevance i.e mutual information with the target variables (A)
    Calculate mutual information with each of the features already selected.
    Take the average of values from the previous step (B)
    Assign score S = A - B
  Select the feature with the highest score and add to the selected list of features

The other 3 algorithms differ in the quantity B is calculated. Please refer to the citation for details. The algorithms work for any combination numerical and categorical features and target as follows

Numerical feature, categorical target
Categorical feature, categorical target
Numerical feature, numerical target
Categorical feature, numerical target

Feature Selection in Mortgage Loan Data

The data set is artificially created. It has mixture of 14 numerical and categorical features. the fields are as follows

Loan ID (not feature)
Marital status
No of children
Education level
Whether self employed
Income
Years of experience
No of years in current job
Debt amount
Loan amount
Loan term
Credit score
Bank account balance
Retirement account balance
No of prior mortgage loans
Approved or not (target)

To use the data explorer module, the following steps are necessary as you can see in the driver code as an example.

Create instance of DataExplorer
Register data sets for all features and target
Call the feature selection function

Here is some sample output. I called MRMR and JMI feature selection functions asking for 3 features to be selected.

{'selFeatures': [('income', 0.3337975824757886), ('education', -0.028132114867564062), ('selfemp', 0.003771770973891475)]}
{'selFeatures': [('income', 0.3337975824757886), ('crscore', 0.014392295006549372), ('education', -0.011154126726672847)]}

The 2 algorithms agree on 2 out of 3 features. The output contains selected feature names along with scores. Please refer to the tutorial doc for details.

Wrapping Up

The module daexp contains close to 100 EDA function, mostly based on existing Python libraries. We have only covered some feature selection algorithms.

Information Theory Based Techniques

Feature Selection in Mortgage Loan Data

Wrapping Up

Primary Services

Pre-Built Applications

Data & AI Solutions

Get Exclusive Insights

Insights

Talk To Us

Information Theory Based Techniques

Feature Selection in Mortgage Loan Data

Wrapping Up

Share This Article

Related Posts

A Comprehensive Guide on Latent Dirichlet Allocation

Demystifying Physics with Deep Learning: A Guide to Physics-Informed Neural Networks (PINNs) with Python

Time Series Classification with Neural Network using Random Sub Sequence Statistics as Features

Time Series Data Exploration with Wavelet Transform

Primary Services

Pre-Built Applications

Data & AI Solutions

Get Exclusive Insights

Insights

Talk To Us