MABBI – Research conducted by Rossy Nurhasanah, Agus Buono, and Wisnu Ananta Kusuma from University of Sumatera Utara and IPB University entitled Combining Signal To Noise Ratio And Undersampling In Single Nucleotide Polymorphisms Identification.
Imbalanced data distribution is a challenge in identifying Single Nucleotide Polymorphisms (SNP) as the amount of false SNP data is far greater than actual SNP data, leading to inappropriate classification results. To overcome this issue, we propose using the Signal to Noise Ratio as a feature selection approach combined with an undersampling technique. We recommend the five best features out of the 24 available ones: maximum quality of minor alleles, average quality of minor alleles, minor allele frequencies, probability of error, and balance of alleles. Our proposed model, which applies five selected features followed by the undersampling process, achieves the highest average sensitivity and F-Measure of 0.96 and 0.92, respectively, while also improving computation speed by up to 28 times. Our strategy is especially suitable for classifications with an imbalanced data distribution, particularly for large data sizes. (Tri/MABBI)
Leave a Reply