Solve the Class-Imbalanced Problem
2017-08-21 MacLearn machine-learning class-imbalancedIn the imbalanced class distribution cases, the classifiers tend to produce high classification accuracies on the majority calss but poor classification accuracies on the minority ones. Class imbalance problem has been studied by many researchers, and the proposed methods can be categorized under three headings:
- Pre-sampling methods
makes the training set balanced, either by oversampling the minority class or by undersampling the majority class. - Cost-sensitive methods
assigns a higher misclassification cost to the minority class than to the majority class. - Other methods
integrating sampling with training process.
A mindmap that summarizes the methods referred in [2].
P.S
A python package offering a number of re-sampling techniques commonly used in datasets showing strong between-class imbalance [3]. imbalanced-learn
Reference
[1] Nitesh Chawla, Kevin Bowyer, Lawrence Hall, W Philip Kegelmeyer. (2002). SMOTE: Synthetic Minority Over-sampling Technique. JMLR
[2] Minlong Lin, Ke Tang, Xin Yao. (2013). Dynamic Sampling Approach to Training Neural Networks for Multiclass Imbalance Classification. IEEE TNNLS
[3] Imbalanced-Learn (A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning)
Comments