PhD Opportunity

Automated Inconsistency Detection and Measurement for Imbalanced Data Classification

Background

Imbalance problem is a crucial problem in classification problem. Most of the standard classification algorithms usually assume that training examples are evenly distributed among different classes. However, the imbalanced problem occurs in many practical applications when there are significantly fewer training instances of one class compared to other class data sets, such as in fault diagnosis, defect detection [1], credit risk assessment [2], and bioinformatics [3]. The imbalanced data problem occurs when class examples are inherently rare or hard to collect. Studies show that imbalanced class distributions in many applications cause poor performances from standard classification algorithms. There have been many attempts at dealing with classification of imbalanced data sets [1-4].

Research Program

In the present project, we seek to build a new classification system capable of processing imbalanced data with high accuracy and efficiency: 1) developing new algorithms to automatically detect and measure possibly confliction in the data set; 2) developing and evaluating new classification algorithms which is able to incorporate the detected confliction measurement with other strategies and measurements suitable for imbalance data sets; 3) implementing a prototype system to demonstrate and justify the proposed classification system.

Anticipated Outcomes

This research will build on the existing research work of the supervisors. It is anticipated that a successful completion of the project will result in a new and practical classification system with wide range of applications, especially in health sciences such as biomedical data classification, disease risk analysis, and activity recognition or fault diagnosis, defect detection.

References

[1] N. Japkowicz and S. Stephen (2002), The class imbalance problem: A systematic study. Intelligent Data Analysis. 6(5).
[2] G. H. Nguyen, A. Bouzerdoum, and S. Phung (2009), Learning pattern classification tasks with imbalanced data sets. Book chapter in Pattern Recognition (P. Yin eds.), pp. 193-208). Vukovar, Croatia: In-Teh.
[3] S. Oh, M.S. Lee, and B.-T. Zhang (2011), Ensemble Learning with Active Example Selection for Imbalanced Biomedical Data Classification, IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 8, No. 2., 316-325.
[4] N.V. Chawla, N. Japkowicz, and A. Kolcz (2004), Editorial: Special Issue on Learning from Imbalanced Data Sets, ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 1-6.

Personnel Involved

First Supervisor: Liu, J Dr
Second Supervisor: Wang, HY Dr
Third Supervisor: Zheng, H Dr
Collaborator: Prof Luis Martinez-Lopez (Spain)

Collaboration: This project does not involve collaboration with another establishment

Synopsis:

Imbalance problem occurs in many practical applications when there are significantly fewer training instances of one class compared to other class data sets, which caused poor performances from standard classification algorithms. In the present project, we seek to build a new classification system capable of processing imbalanced data with high accuracy and efficiency: 1) developing new algorithms to automatically detect and measure possibly confliction in the data set; 2) developing and evaluating new classification algorithms which is able to incorporate the detected confliction measurement with other strategies and measurements suitable for imbalance data sets; 3) implementing a prototype system to demonstrate and justify the proposed classification system. It is anticipated that a successful completion of the project will result in a new and practical classification system with wide range of applications, especially in health sciences such as biomedical data classification, disease risk analysis, and activity recognition or fault diagnosis, defect detection.

Return to list of PhD Opportunities