Feature Selection Approach for Solving Imbalanced Data Problem in Single Nucleotide Polymorphism Discovery

MABBI – Research conducted by Rossy Nurhasanah, Lailan Sahrina Hasibuan, and Wisnu Ananta Kusuma from University of Northern Sumatra entitled Feature selection approach for solving imbalanced data problem in single nucleotide polymorphism discovery. Single Nucleotide Polymorphism (SNP) is a type of molecular marker which constitutes the phenotypic variations between individuals in certain species. In recent years, the advantages of SNP were widely considered in many fields, for instance in designing precision medicine in humans and assembling superior cultivars in plant breeding.
The main challenge in SNP discovery is imbalanced data distribution between classes, where the number of true SNPs in question is much fewer than false SNPs. While the study in observing the benefit of feature selection in classification problem was widely reported, the use of this technique in solving imbalanced class problem still become interesting topic for research. In this study, we selected the features that most contribute in identifying SNP using Feature Assessment by Sliding Thresholds (FAST) method. FAST evaluates the contribution of each feature in identifying SNPs based on the Area under ROC Curve (AUC) value. SNP identification using 4 best features resulted in improved classifier performance in terms of G-Means compared to using 24 features. In addition, using feature selection techniques can reduce computational time and save resource needed. (Tri/MABBI)


Read more:
 https://iopscience.iop.org/article/10.1088/1742-6596/1566/1/012035/pdf


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *