Text Mining Infrastructure in Erlang

Abbas Jkhayyir Kadhim

Abstract

Nowadays, one of the most important aspects of our lives is intrinsic pillars, which are the fastest element to implement in order to obtain an efficient result and support decision making. This paper proves the fastest method of classification algorithm performance during the extraction of data knowledge. While analyzing the text file description, data mining extraction for the text file is always used in conjunction with various fields, such as natural language processing (NLP) and information data. The preparation of the text mining algorithms for the text file is based on two files that are derived from Erlang shell framework. The first file is the text file, which requires interpretation and results, while the second file contains the actual words from the English dictionary. Inter-process communication is required to obtain a result file on the classified data according to the correct or incorrect words in the English language. We also present analysis techniques using methods based on text count, text classification, and string kernels. The article provides two deductive tips, with the information retrieval data of the text file including the percentage of correct or incorrect words. Information retrieval must be performed as a Knowledge Discovery in Databases (KDD), using the frequent pattern analysis technique under the classification algorithm to provide us with a full useful analysis of the text file. Furthermore, the result test under the runtime demonstrates the code written by Erlang functional programming language, which provides high efficiency in the speed of its performance in implementation.

 

Keywords: machine learning, text mining, supervised learning, classification algorithm, Erlang functional programming language.


Full Text:

PDF


References


ARMSTRONG J., VIRDING R., WIKSTRÖM C., and WILLIAMS M. Concurrent Programming in Erlang. 2nd ed. Prentice Hall, Herfordshire, 1996.

ARMSTRONG J. A history of Erlang. Proceedings of the 3rd ACM SIGPLAN Conference on History of Programming Languages, 2007.‏‏

LINDAHL T., & SAGONAS K. TYPER: A Type Annotator of Erlang Code. Proceedings of the ACM SIGPLAN Workshop on Erlang, Tallinn, 2005, pp. 17–25. https://doi.org/10.1145/1088361.1088366

LISON, P. An introduction to machine learning. Language Technology Group, Edinburgh, 2015.

WITTEN I. H., & FRANK E. Data Mining: Practical Machine Learning Tools and Techniques. 2nd ed. Morgan Kaufmann Publishers, San Francisco, California, 2005. https://academia.dk/BiologiskAntropologi/Epidemiologi/DataMining/Witten_and_Frank_DataMining_Weka_2nd_Ed_2005.pdf

MEYER D., HORNIK K., and FEINERER I. Text mining infrastructure in R. Journal of Statistical Software, 2008, 25(5): 1-54.‏ https://doi.org/10.18637/jss.v025.i05

LIU L., & ÖZSU M. T. Encyclopedia of database systems, Vol. 6. Springer, New York, 2009. https://doi.org/10.1007/978-1-4614-8265-9

HIPP J., & GÜNTZER U. Is pushing constraints deeply into the mining algorithms really what we want? - An alternative approach for association rule mining. ACM SIGKDD Explorations Newsletter, 2002, 4(1): 50-55. https://doi.org/10.1145/568574.568582

FEINERER I. Introduction to the tm Package Text Mining in R. 2013. http://cran.r-project.org/web/packages/tm/vignettes/tm.pdf

YASSINE M., & HAJJ H. A framework for emotion mining from text in online social networks. Proceedings of the IEEE International Conference on Data Mining Workshops, Sydney, 2010, pp. 1136-1142. https://doi.org/10.1109/ICDMW.2010.75

ALTMANN E. G., PIERREHUMBERT J. B., and MOTTER A. E. Beyond word frequency: Bursts, lulls, and scaling in the temporal distributions of words. PLoS ONE, 2009, 4(11): e7678.‏ https://doi.org/10.1371/journal.pone.0007678

FAYYAD U., PIATETSKY-SHAPIRO G., and SMITH P. Knowledge Discovery and Data Mining: Towards a Unifying Framework. KDD-96 Proceedings, 1996. https://aaai.org/Papers/KDD/1996/KDD96-014.pdf

PADHY N., MISHRA P., and PANIGRAHI R. The Survey of Data Mining Applications and Feature Scope. International Journal of Computer Science, Engineering and Information Technology, 2012, 2(3): 43-58. https://doi.org/10.5121/ijcseit.2012.2303

BOLASCO S., CANZONETTI A., CAPO F. M., RATTA-RINALDI F. D., and SINGH B. K. Understanding Text Mining: A Pragmatic Approach. Roam, 2002.

ANANIADOU S., & MCNAUGHT J. (eds.) Text Mining. 2006.

SALMAN A. D., AL-FARTTOOSI H. A. D., and KADHIM A. J. Study impact the latitude on Covid-19 spread virus by data mining algorithm. Journal of Physics: Conference Series, 2020, 1664(1): 012109. https://doi.org/10.1088/1742-6596/1664/1/012109

MOSTAFA A. M., IDREES A. M., KHEDR A. E., and HELMY Y. M. A Proposed Architectural Framework for Generating Personalized Users’ Query Response. Journal of Southwest Jiaotong University, 2020, 55(5). https://doi.org/10.35741/issn.0258-2724.55.5.3


Refbacks

  • There are currently no refbacks.