Imbalanced classes put “accuracy” out of business. This is a surprisingly common problem in machine learning (specifically in classification), occurring in datasets with a disproportionate ratio of observations in each class
Up-sample the minority class
- resample module with
replace = True
- resample module with
Down-sample the majority class
- resample module with
replace = False
- resample module with
Change your performance metric
- Area Under ROC Curve (AUROC)
from sklearn.metrics import roc_auc_score
Penalize algorithms (cost-sensitive training)
1 | SVC(kernel='linear', |
Use tree-based algorithms
from sklearn.ensemble import RandomForestClassifier
1 | import numpy as np |
参考资料
- How to Handle Imbalanced Classes in Machine Learning:https://elitedatascience.com/imbalanced-classes