An Evaluation of Artificial Neural Networks and Random Forests for Heart Disease Prediction
Heart diseases are serious problem in many countries worldwide. In Malaysia, it has been a major killer since 1980. Many health conditions are closely related to heart disease. However, a large amount of data that medical centers have collected each year is not well-mined to find connections between them that can aid in the prognosis of heart disease. Therefore, the purpose of this study is to propose a predictive model of heart disease based on machine learning for prognosis to help individuals with symptoms to seek early advice and treatment. By following the Knowledge Discovery in Database (KDD) methodology that includes data selection, data pre-processing, data transformation, data mining, and interpretation or evaluation of acquired knowledge, this study has tested a dataset taken from UCI Machine Learning Repository. The classification of Artificial Neural Network and Random Forest was used. They were selected based on their adequacy in the medical field, particularly in the aspect of prognosis and diagnosis. The accuracy results obtained by the relevant works from previous authors are also high and reliable. This study uses a few ways to determine the maximum accuracy achieved by both algorithms: dataset splitting and K-Fold Cross-Validation. The results of the study on the test set that has been subdivided into several subsets showed that Artificial Neural Network and Random Forest produced stable accuracies by reaching 67.9% and 64.6%, respectively. The accuracy shown by the Artificial Neural Network is more stable for both subsets, training, and testing sets. In conclusion, Artificial Neural Network has been selected as the algorithm capable of working well with the Heart Disease Prediction Model, referring to the accuracy of the test set, which is slightly better than Random Forest.
Keywords: artificial neural network, accuracy, data mining, heart disease prediction, random forest.
BARKLEY S., STARFIELD B., SHI L., and MACINKO J. The contribution of primary care to health systems and health. In: Family medicine: The classic papers. CRC Press, Boca Raton, 2016, 191-239. https://doi.org/10.1201/9781315365305
ABIODUN O. I., JANTAN A., OMOLARA A. E., DADA K. V., MOHAMED N. A., and ARSHAD H. State-of-the-art in artificial neural network applications: A survey. Heliyon, 2018, 4(11), e00938. https://doi.org/10.1016/j.heliyon.2018.e00938
EDUCATIVE. What is a multi-layered perceptron? 2021. https://www.educative.io/edpresso/what-is-a-multi-layered-perceptron
KUMAR A. Random Forest for prediction. Towards Data Science, 2020. https://towardsdatascience.com/random-forest-ca80e56224c1
BERRÍOS-TORRES S. I., UMSCHEID C. A., BRATZLER D. W., LEAS B., STONE E. C., KELZ R. R., REINKE C. E., MORGAN S., SOLOMKIN J. S., MAZUSKI J. E., and DELLINGER E. P. Centers for disease control and prevention guideline for the prevention of surgical site infection, 2017. Journal of the American Medical Association Surgery, 2017, 152(8), 784-791. https://doi.org/10.1001/jamasurg.2017.0904
DEPARTMENT OF STATISTICS MALAYSIA OFFICIAL PORTAL. Statistics on Causes of Death, Malaysia. 2020. https://www.dosm.gov.my.
BENJAMIN E. J., MUNTNER P., ALONSO A., BITTENCOURT M. S., CALLAWAY C. W., CARSON A. P., CHAMBERLAIN A. M., CHANG A. R., CHENG S., DAS S. R., and DELLING F. N. Heart disease and stroke statistics — 2019 update: a report from the American Heart Association. Circulation, 2019, 139(10), e56-528. https://doi.org/10.1161/CIR.0000000000000659
SUBCZYNSKI W. K., PASENKIEWICZ-GIERULA M., WIDOMSKA J., MAINALI L., and RAGUZ M. High cholesterol/low cholesterol: effects in biological membranes: a review. Cell Biochemistry and Biophysics, 2017, 75(3), 369-385. https://doi.org/10.1007/s12013-017-0792-7
FLORA G. D., & NAYAK M. K. A brief review of cardiovascular diseases, associated risk factors, and current treatment regimes. Current Pharmaceutical Design, 2019, 25(38), 4063-4084. https://doi.org/10.2174/1381612825666190925163827
BALLA C., PAVASINI R., and FERRARI R. Treatment of angina: where are we? Cardiology, 2018, 140(1), 52-67. https://doi.org/10.1159/000487936
BOWDEN J., & SINATRA S. T. The Great Cholesterol Myth, Revised and Expanded: Why Lowering Your Cholesterol Won't Prevent Heart Disease - and the Statin-Free Plan that Will. Fair Winds Press, Beverly, 2020.
HEMANTH D. J. Data mining technique based critical disease prediction in medical field. In: Intelligent Systems and Computer Technology. IOS Press, Amsterdam, 2020.
SHARMA S., & OSEI-BRYSON K. M. Toward an integrated knowledge discovery and data mining process model. The Knowledge Engineering Review, 2010, 25(1), 49-67. https://doi.org/10.1017/S0269888909990361
ALAM M. Z., RAHMAN M. S., and RAHMAN M. S. A Random Forest based predictor for medical data classification using feature ranking. Informatics in Medicine Unlocked, 2019, 15, 100180. https://doi.org/10.1016/j.imu.2019.100180
WU C. C., YEH W. C., HSU W. D., ISLAM M. M., NGUYEN P. A., POLY T. N., WANG Y. C., YANG H. C., and LI Y. C. Prediction of fatty liver disease using machine learning algorithms. Computer Methods and Programs in Biomedicine, 2019, 170, 23-29. https://doi.org/10.1016/j.cmpb.2018.12.032
KAUR P., KUMAR R., and KUMAR M. A healthcare monitoring system using random forest and internet of things (IoT). Multimedia Tools and Applications, 2019, 78(14), 19905-19916. https://doi.org/10.1007/s11042-019-7327-8
XU L., LIANG G., LIAO C., CHEN G. D., and CHANG C. C. K-skip-n-gram-RF: a random Forest based method for Alzheimer's disease protein identification. Frontiers in Genetics, 2019, 10, 33. https://doi.org/10.3389/fgene.2019.00033
SAADOON Y. A., & ABDULAMIR R. H. Improved Random Forest Algorithm Performance for Big Data. Journal of Physics: Conference Series, 2021, 1897(1), 012071. https://doi.org/10.1088/1742-6596/1897/1/012071
GUO C., ZHANG J., LIU Y., XIE Y., HAN Z., and YU J. Recursion enhanced random forest with an improved linear model (rerf-ilm) for heart disease detection on the internet of medical things platform. Institute of Electrical and Electronics Engineers Access, 2020, 8, 59247-59256. https://doi.org/10.1109/ACCESS.2020.2981159
MALAV A., KADAM K., and KAMAT P. Prediction of heart disease using k-means and artificial neural network as hybrid approach to improve accuracy. International Journal of Engineering and Technology, 2017, 9(4), 3081-3085. http://dx.doi.org/10.21817/ijet/2017/v9i4/170904101
COSTA W. L., FIGUEIREDO L. S., and ALVES E. T. Application of an Artificial Neural Network for Heart Disease Diagnosis. In: XXVI Brazilian Congress on Biomedical Engineering. Springer, Singapore, 2019, 753-758. http://dx.doi.org/10.1007/978-981-13-2517-5_115
DUTTA A., BATABYAL T., BASU M., and ACTON S. T. An efficient convolutional neural network for coronary heart disease prediction. Expert Systems with Applications, 2020, 159, 113408. https://doi.org/10.1016/j.eswa.2020.113408
MA F., SUN T., LIU L., and JING H. Detection and diagnosis of chronic kidney disease using deep learning-based heterogeneous modified artificial neural network. Future Generation Computer Systems, 2020, 111: 17-26. https://doi.org/10.1016/j.future.2020.04.036
TECHOPEDIA. Knowledge Discovery in Databases (KDD). 2017. https://www.techopedia.com/definition/25827/knowledge-discovery-in-databases-kdd
UNIVERSITY OF CALIFORNIA. Machine Learning Repository. 2019. https://archive.ics.uci.edu/ml/index.php
ABU BAKAR W. A. W., MAN M., MAN M, and ABDULLAH Z. I-Eclat: Performance enhancement of Eclat via incremental approach in frequent itemset mining. Telkomnika, 2020, 18(1), 562-570. http://dx.doi.org/10.12928/telkomnika.v18i1.13497
ABU BAKAR W. A. W., JALIL M. A., MAN M., ABDULLAH Z., and MOHD F. Postdiffset: an Eclat-like algorithm for frequent itemset mining. International Journal of Engineering & Technology, 2018, 2(28), 197-199. http://dx.doi.org/10.14419/ijet.v7i2.28.12911
JUSOH J. A., & MAN M. Modifying iEclat Algorithm for Infrequent Patterns Mining. Advanced Science Letters, 2018, 24(3), 1876-1880. https://doi.org/10.1166/asl.2018.11180
YUSOF M. K., & MAN M. Efficiency of JSON for data retrieval in big data. Indonesian Journal of Electrical Engineering and Computer Science, 2017, 1, 250-262. http://dx.doi.org/10.11591/ijeecs.v7.i1.pp250-262
- There are currently no refbacks.