Deep Learning of a Pre-trained Language Model's Joke Classifier Using GPT-2

Nur Arifin Akbar, Irma Darmayanti, Suliman Mohamed Fati, Amgad Muneer


Humor generation and classification are one the most challenging problems in computational Natural Language Understanding. Even humans fail at being funny and recognizing humor. This study attempts to create a joke generator using a large pre-trained language model (GPT2). Further, the authors develop a jokes classifier by fine-tuning pre-trained (BERT) to classify the generated jokes and attempt to understand what distinguishes joke sentence(s) from non-joke sentence(s). Qualitative analysis reveals that the classifier model has specific internal attention patterns while classifying joke sentences, which is absent when classifying normal sentences. The experimental results show the superiority of the BERT model compared to CNN and RNN+ attention baselines in terms of accuracy, precision, recall, and F1-score. The BERT model has achieved an accuracy of 0.983, precision (0.953), recall (0.978), and F1-score (0.964)


Keywords:  Deep Learning, Pre-trained, Joke Classifier, Generative Pre-trained Transformer 2 (GPT-2), Bidirectional Encoder Representations from Transformers (BERT).


Full Text:



KHODAK M, SAUNSHI N, VODRAHALLI K. A large self-annotated corpus for sarcasm. CoRR, 2017. abs/1704.05579.

DAVIDOV D, TSUR O, RAPPOPORT A. Semi-supervised recognition of sarcastic sentences in Twitter and Amazon. Proceedings of the Fourteenth Conference on Computational Natural Language Learning, CoNLL, 2010, 10: 107–116.

BARBIERI F, SAGGION H. Modelling irony in twitter, 2010, 56–64.

REYES C, BRACKETT M, RIVERS S, WHITE M, SALOVEY P. Classroom emotional climate, student engagement, and academic achievement. Journal of Educational Psychology, 2012, 104: 700–712.

BINSTED K, PAIN H, RITCHIE G. Children's evaluation of computer-generated punning riddles. Pragmatics Cognition, 1997, 5.

PETROVIC S, MATTHEWS D. Unsupervised joke generation from big data. ACL, 2013.

STOCK O, STRAPPARAVA C. Getting serious about the development of computational humor, 2003, 59–64.

HOWARD J, RUDER S. Fine-tuned language models for text classification. CoRR, 2018. abs/1801.06146.

DEVLIN J, CHANG M-W, LEE K, TOUTANOVA K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint, 2018. arXiv:1810.04805.

FAN A, LEWIS M, DAUPHIN YN. Hierarchical neural story generation. CoRR, 2018, abs/1805.04833.

ATTARDO S. Linguistic theories of humor. Walter de Gruyter, 2010, 1.

RADFORD A, WU J, CHILD R, LUAN D, AMODEI D, SUTSKEVER I. Language models are unsupervised multitask learners, 2019.

WOLF T, DEBUT L, SANH V, CHAUMOND J, DELANGUE C, MOI A, CISTAC P, RAULT T, LOUF R, FUNTOWICZ M, BREW J. Hugging face’s transformers: State-of-the-art natural language, 2019.

HOLTZMAN A, BUYS J, FORBES M, CHOI Y. The curious case of neural text degeneration. CoRR, 2019, abs/1904.09751.

KESKAR NS, MCCANN B, VARSHNEY LR, XIONG C, SOCHER R. Ctrl: A conditional transformer language model for controllable generation, 2019.

MUNEER A, FATI SM. Efficient and Automated Herbs Classification Approach Based on Shape and Texture Features using Deep Learning. IEEE Access, 2020, 8: 196747-196764.

DURAIRAJAH V, GOBEE S, MUNEER A. Automatic vision-based classification system using DNN and SVM classifiers. 2018 3rd International Conference on Control, Robotics, and Cybernetics (CRC), 2018, 6-14.

NASEER S, ALI RF, MUNEER A, FATI SM. IAmideV-deep: Valine amination site prediction in proteins using deep learning and pseudo amino acid compositions. Symmetry, 2021, 13(4): 560.

NASEER S, ALI RF, FATI SM, MUNEER A. iNitroY-Deep: Computational Identification of Nitrotyrosine Sites to Supplement Carcinogenesis Studies Using Deep Learning. IEEE Access, 2021, 9: 73624-73640.

JOHNSON R, ZHANG T. Semi-supervised convolutional neural networks for text categorization via region embedding, 2015.

DE OLIVEIRA L, RODRIGO AL. Humor detection in yelp reviews, 2015.

VIG J. A multiscale visualization of attention in the transformer model. arXiv preprint, 2019. arXiv:1906.05714.

CHEN P-Y, SOO V-W. Humor recognition using deep learning. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018, 2: 113–117.

MUNEER A, FATI SM. A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on Twitter. Future Internet, 2020, 12(11): 187.

DE OLIVEIRA L, RODRIGO AL. Humor detection in yelp reviews, 2015.

YANG Z, DAI Z, YANG Y, CARBONELL J, SALAKHUTDINOV R, LE QV. Xlnet: Generalized autoregressive pre-training for language understanding, 2019.

RAFAY A, SULEMAN M, ALIM A. Robust Review Rating Prediction Model based on Machine and Deep Learning: Yelp Dataset. 2020 International Conference on Emerging Trends in Smart Technologies (ICETST), 2020, 8138-8143.


  • There are currently no refbacks.