By Dave DeFusco
When it comes to building effective machine learning models, one of the most critical steps is feature engineering. This process involves selecting and transforming raw data into meaningful 鈥渇eatures鈥 that the model can use to make predictions. Done well, feature engineering can significantly enhance a model鈥檚 accuracy and reliability.
In a recent study, 鈥淢utual Information Reduction Techniques and its Applications in Feature Engineering,鈥 researchers in the Katz School鈥檚 Graduate Department of Computer Science and Engineering explore which features matter most and how they work together in the best way. They presented their findings at the 2025 IEEE International Conference on Consumer Electronics in January.
The traditional approach relies heavily on mutual information (MI), a statistical measure that tells us how much one piece of information reveals about another. For example, in a machine learning model predicting loan defaults, mutual information could identify features like credit score or income as highly relevant to the prediction.
While MI is powerful, most methods focus solely on maximizing the MI between features and the target outcome. This ensures the model gets the most relevant data, but it overlooks an important problem: redundancy among features. For instance, if two features鈥攍ike monthly income and annual income鈥攁re highly correlated, they add repetitive information that can clutter the model and slow it down.
鈥淲e introduce a new way of thinking: instead of only looking for features with the highest MI scores, our study also focuses on reducing mutual information between the features themselves,鈥 said David Li, senior author of the paper and program director of the M.S. in Data Analytics and Visualization. 鈥淭his is important because by minimizing redundancy, we create a set of features, each adding unique, valuable information to the model. This approach ensures the model becomes more efficient, accurate and better at handling complex tasks like classification.鈥
The method starts with an MI matrix, which shows how much information each feature shares with others. By applying mutual information reduction techniques, the process identifies and removes overlapping information. This results in a refined dataset where each feature stands out for its unique contribution.
The researchers also incorporated Weight of Evidence (WOE), a transformation technique that boosts a feature鈥檚 predictive power, particularly in binary classification tasks like 鈥測es/no鈥 or 鈥渁pprove/reject鈥 decisions. WOE captures subtle nuances in the data, ensuring that even after redundancy is reduced, the features remain highly informative.
To test their method, the researchers applied it to a loan default dataset. Using a 鈥渂rute-force鈥 method to fine-tune parameters, they successfully minimized mutual information between features. The result? A leaner, smarter model with significantly reduced redundancy.
They then layered on the WOE transformation, which further enhanced the model鈥檚 performance鈥攅specially for logistic regression models commonly used in risk management. This dual approach not only improved accuracy but also offered better insights into the factors driving loan defaults.
This breakthrough offers a smarter way to build machine learning models that aren鈥檛 bogged down by irrelevant or repetitive data. The implications are vast:
- Efficiency: Less redundant data means faster models.
- Accuracy: A cleaner feature set improves predictive performance.
- Interpretability: Models built on unique, non-redundant features are easier to understand, making them invaluable in fields like finance, healthcare and customer analytics.
鈥淭he study opens the door to new possibilities,鈥 said Ruixin Chen, lead author of the study and a student in the M.S. in Artificial Intelligence. 鈥淔uture research could explore automated ways to optimize mutual information reduction, apply the technique to more complex datasets or expand its use in unsupervised learning tasks.鈥