Por favor, use este identificador para citar o enlazar este ítem:https://uvadoc.uva.es/handle/10324/63340
Título
Breast cancer prediction using fine needle aspiration features and upsampling with supervised machine learning
Autor
Año del Documento
2023
Editorial
MDPI
Descripción
Producción Científica
Documento Fuente
Cancers, 2023, Vol. 15, Nº. 3, 681
Abstract
Simple Summary: Breast cancer is prevalent in women and the second leading cause of death. Conventional breast cancer detection methods require several laboratory tests and medical experts. Automated breast cancer detection is thus very important for timely treatment. This study explores the influence of various feature selection technique to increase the performance of machine learning methods for breast cancer detection. Experimental results shows that use of appropriate features tend to show highly accurate prediction. Breast cancer is one of the most common invasive cancers in women and it continues to be a worldwide medical problem since the number of cases has significantly increased over the past decade. Breast cancer is the second leading cause of death from cancer in women. The early detection of breast cancer can save human life but the traditional approach for detecting breast cancer disease needs various laboratory tests involving medical experts. To reduce human error and speed up breast cancer detection, an automatic system is required that would perform the diagnosis accurately and timely. Despite the research efforts for automated systems for cancer detection, a wide gap exists between the desired and provided accuracy of current approaches. To overcome this issue, this research proposes an approach for breast cancer prediction by selecting the best fine needle aspiration features. To enhance the prediction accuracy, several feature selection techniques are applied to analyze their efficacy, such as principal component analysis, singular vector decomposition, and chi-square (Chi2). Extensive experiments are performed with different features and different set sizes of features to investigate the optimal feature set. Additionally, the influence of imbalanced and balanced data using the SMOTE approach is investigated. Six classifiers including random forest, support vector machine, gradient boosting machine, logistic regression, multilayer perceptron, and K-nearest neighbors (KNN) are tuned to achieve increased classification accuracy. Results indicate that KNN outperforms all other classifiers on the used dataset with 20 features using SVD and with the 15 most important features using a PCA with a 100% accuracy score.
Materias (normalizadas)
Breast - Cancer - Diagnosis
Cancer research
Breast - Diseases
Mamas - Cáncer
Cytology
Mamas - Citología
Principal components analysis
Análisis multivariante
Singular value decomposition
Machine learning
Aprendizaje automático
Artificial intelligence
Oncology
Materias Unesco
3207.13 Oncología
1203.04 Inteligencia Artificial
ISSN
2072-6694
Revisión por pares
SI
Version del Editor
Propietario de los Derechos
© 2023 The authors
Idioma
eng
Tipo de versión
info:eu-repo/semantics/publishedVersion
Derechos
openAccess
Aparece en las colecciones
Files in questo item
La licencia del ítem se describe como Atribución 4.0 Internacional