Comparison of Seventeen Missing Value Imputation Techniques

Wafaa Mustafa Hameed, Nzar A. Ali

Abstract

Copious data are collected and put away each day. That information can be utilized to extricate curiously designs. However, the information that we collect is ordinarily inadequate. Presently, utilizing that information to extricate any data may allow deceiving comes about. Utilizing that, we pre-process the information to exterminate the variations from the norm. In case of a low rate of lost values, those occurrences can be overlooked, but, in the case of huge sums, overlooking them will not allow wanted results. Many lost spaces in a dataset could be a huge issue confronted by analysts because it can lead to numerous issues in quantitative investigations. So, performing any information mining procedures to extricate a little good data out of a dataset, a few pre-processings of information can be done to dodge such paradoxes and, in this manner, move forward the quality of information. For handling such lost values, numerous methods have been proposed since 1980. The best procedure is to disregard the records containing lost values. Another method is ascription, which includes supplanting those lost spaces with a few gauges by doing certain computations. This would increment the quality of information and would extemporize forecast comes about. This paper gives an audit on methods for handling lost information like median imputation (MDI), hot (cold) deck imputation, regression imputation, expectation maximization (EM), support vector machine imputation (SVMI), multivariate imputation by chained equation (MICE), SICE technique, reinforcement programming, nonparametric iterative imputation algorithms (NIIA), and multilayer perceptrons. This paper also explores some good options of methods to estimate missing values to be used by other researchers in this field of study. Also, it aims to help them to figure out what method is commonly used now. The overview may also provide insight into each method and its advantages and limitations to consider for future research in this field of study. It can be a baseline to answer the questions of which techniques have been used and which is the most popular.


Keywords: imputation, mean, mode, data.

 

https://doi.org/10.55463/issn.1674-2974.49.7.4


Full Text:

PDF


References


DOSHI B. Handling Missing Values in Data Mining, 2010. https://pdfs.semanticscholar.org/3817/b208fe1f40891cc661ea0db80c8fccc56b70.pdf.

GUPTA S., & GUPTA M. K. A Survey on Different Techniques for Handling Missing Values in Dataset. International Journal of Scientific Research in Computer Science, Engineering and Information Technology, 2018, 4(1): 2456-3307. https://ijsrcseit.com/CSEIT411849

JADHAV A., PRAMOD D., and RAMANATHAN K. Comparison of Performance of Data Imputation Methods for Numeric Dataset. Applied Artificial Intelligence, 2019, 33(10): 913-933. https://doi.org/10.1080/08839514.2019.1637138

SCHEFFER J. Dealing with Missing Data. Research Letters in the Information and Mathematical Sciences, 2002, 3(1): 153-160.

PATIL D. V. Multiple Imputation of Missing Data with Genetic Algorithm based Techniques. IJCA Special Issue on “Evolutionary Computation for Optimization Techniques”, 2010: 74-78. https://www.ijcaonline.org/ecot/number2/SPE140T.pdf

KHAN S. I., & HOQUE A. S. M. L. SICE: an improved missing data imputation technique. Journal of Big Data, 2020, 7: 37. https://doi.org/10.1186/s40537-020-00313-w

SINGH S. Estimation of Missing Values in the Data Mining and Comparison of Imputation Methods. Mathematical Journal of Interdisciplinary Sciences, 2013, 1(2): 75–90. https://doi.org/10.15415/mjis.2013.12015

PRATAMA I., PERMANASARI A. E., ARDIYANTO I., and INDRAYANI R. A review of missing values handling methods on time-series data. Proceedings of the International Conference on Information Technology Systems and Innovation, Bandung, 2016, pp. 1-6. https://doi.org/10.1109/ICITSI.2016.7858189

WANG S., & WANG, H. Mining Data Quality in Completeness, 2007. http://mitiq.mit.edu/iciq/PDF/MINING%20DATA%20QUALITY%20IN%20COMPLETENESS.pdf

VAISHNAV R. L., & PATEL K. M. Analysis of Various Techniques to Handling Missing Value in Dataset. International Journal of Innovative and Emerging Research in Engineering, 2015, 2(2).

RAGHUNATH A. Survey Sampling Theory and Applications. Academic Press, Cambridge, 2017.

REBECCA H., & GLAS C. A. W. Modelling Non-Ignorable Missing-Data Mechanisms with Item Response Theory Models. The British Journal of Mathematical and Statistical Psychology, 2005, 58(1): 1–17. https://doi.org/10.1348/000711005x47168

PURI A., & GUPTA M. Review on Missing Value Imputation Techniques in Data Mining. International Journal of Scientific Research in Computer Science, Engineering and Information Technology, 2017, 2(7): 35-40. https://doi.org/10.32628/CSEIT174405

SASI KUMAR A., & SRIRAM AKRISHNA G. V. Internet of Things Based Clinical Decision Support System Using Data Mining Techniques. Journal of Advanced Research in Dynamical & Control Systems, 2018, 10(4): 132-139.

GRZYMALA-BUSSE J. W., GOODWIN L. K., GRZYMALA-BUSSE W. J., and ZHENG X. Handling Missing Attribute Values in Preterm Birth Data Sets. In: ŚLĘZAK D., YAO J., PETERS J. F., ZIARKO W., and HU X. (eds.) Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing. RSFDGrC 2005. Lecture Notes in Computer Science, Vol. 3642. Springer, Berlin, Heidelberg, 2005: 342-351. https://doi.org/10.1007/11548706_36

VAN BUUREN S., & GROOTHUIS-OUDSHOORN K. MICE: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 2010, 45(3): 1–67. https://doi.org/10.18637/jss.v045.i03

HAN J., KAMBER M., and PEI J. Data mining: Concepts and techniques. 3rd ed. Morgan Kaufmann Publishers, San Francisco, California, 2012. https://sku.ac.ir/Datafiles/BookLibrary/43/Data-Mining-Concepts-and-Techniques-Han.pdf

CHHABRA G., VASHISHT V., and RANJAN J. A Comparison of Multiple Imputation Methods for Data with Missing Values. Indian Journal of Science and Technology, 2017, 10(19): 1-7. https://dx.doi.org/10.17485/ijst/2017/v10i19/110646

AWAN S. E., BENNAMOUN M., SOHEL F., SANFILIPPO F., and DWIVEDI G. A reinforcement learning-based approach for imputing missing data. Neural Computing and Applications, 2022, 34: 9701–9716. https://doi.org/10.1007/s00521-022-06958-3

RACHMAWAN I. E. W., & BARAKBAH A. R. Optimization of Missing Value Imputation Using Reinforcement Programming. Proceedings of the International Electronics Symposium, Surabaya, 2015, pp. 128-133. https://doi.org/10.1109/ELECSYM.2015.7380828

ZHANG S., JIN Z., and ZHU X. Missing data imputation by utilizing information within incomplete instances. The Journal of Systems and Software, 2011, 84(3): 452–459. https://doi.org/10.1016/j.jss.2010.11.887

SILVA-RAMÍREZ E. L., PINO-MEJÍAS R., LÓPEZ-COELLO M., and CUBILES-DE-LA-VEGA M. D. Missing value imputation on missing completely at random data using multilayer perceptrons. Neural Networks, 2011, 24(1): 121-129. https://doi.org/10.1016/j.neunet.2010.09.008

ALJUAID T., & SASI S. Intelligent Imputation Technique for Missing Values. Proceedings of the International Conference on Advances in Computing, Communications and Informatics, Jaipur, 2016, pp. 2441-2445. https://doi.org/10.1109/ICACCI.2016.7732423

HAMEED W. M., & ALI N. A. Enhancing imputation techniques performance utilizing uncertainty aware predictors and adversarial learning. Periodicals of Engineering and Natural Sciences, 2022, 10(3): 350-367. http://dx.doi.org/10.21533/pen.v10i3.3110

SCHMITT P., MANDEL J., and GUEDJ M. A Comparison of Six Methods for Missing Data Imputation. Journal of Biometrics and Biostatistics, 2015, 6(1): 1000224. http://dx.doi.org/10.472/2155-6180.1000223

SRIDEVI S., RAJARAM S., PARTHIBAN C., SIBIARASAN S., and SWADHIKAR C. Imputation for the analysis of missing values and prediction of time series data. Proceedings of the International Conference on Recent Trends in Information Technology, Chennai, 2011, pp. 1158–1163. https://doi.org/10.1109/ICRTIT.2011.5972466


Refbacks

  • There are currently no refbacks.