An Enhanced Information Retrieval-Based Bug Localization System with Code Coverage, Stack Traces, and Spectrum Information
Abstract
Several strategies such as Vector Space Model (VSM), revised Vector Space Model (rVSM), and integration of additional elements such as stack trace and previously corrected bug report have been utilized to improve the Information Retrieval (IR) based bug localization process. Most of the existing IR-based approaches make use of source code files without filtering, which eventually increases the search space of the technique, thereby slowing down the bug localization process. This study developed an enhanced IR-based bug localization model as a viable solution. Specifically, an enhanced rVSM (e-rVSM) is developed based on the hybridization of code coverage, stack traces, and spectrum information. Combining the stack trace and spectrum information as additional features can enhance the accuracy of the IR-based technique by boosting the bug localization process. Code coverage analysis was conducted to remove irrelevant source files and reduce the search space of the IR technique. Then the filtered source files are preprocessed via tokenization and stemming from selecting relevant features and removing unwanted words. The preprocessed data is further analyzed by finding similarities between the preprocessed bug reports and source code files using the e-rVSM. Finally, scores for each source code and suspected buggy files are ranked in descending order. The performance of the proposed e-rVSM is tested on two open-source projects (Zxing and SWT), and its effectiveness is assessed using TopN rank (where N = 5, 10), Mean Reciprocal Rank (MRR), and Mean Average Precision (MAP). Findings from the experimental results revealed the effectiveness of e-rVSM in bug localization. In particular, e-rVSM recorded a significant Top 5 (80.2%; 65%) and Top 10 (89.1%; 75%) rank values on SWT and Zxing dataset respectively. Also, the proposed e-rVSM had MRR values of 80% and 54% on the SWT dataset and MAP values of 61.22% and 47.23% on the Zxing dataset.
Keywords: information retrieval, bug localization, vector space model.
https://doi.org/10.55463/issn.1674-2974.49.4.12
Full Text:
PDFReferences
BALOGUN A. O., BASRI S., ABDULKADIR S. J. and HASHIM A. S. Performance analysis of feature selection
methods in software defect prediction: a search method approach. Applied Sciences, 2019, 9(13): 2764.
https://doi.org/10.3390/app9132764
XIAO Y., KEUNG J., BENNIN K. E., and MI Q. Improving bug localization with word embedding and
enhanced convolutional neural networks. Information and Software Technology, 2019, 105: 17-29
https://doi.org/10.1016/j.infsof.2018.08.002
BALOGUN A. O., BASRI S., MAHAMAD S., CAPRETZ L. F., IMAM A. A., ALMOMANI M. A.,
ADEYEMO V. E., and KUMAR G. A Novel Rank Aggregation-Based Hybrid Multifilter Wrapper Feature
Selection Method in Software Defect Prediction. Computational Intelligence and Neuroscience, 2021, 2021:
https://doi.org/10.1155/2021/5069016
BETTENBURG N., PREMRAJ R., ZIMMERMANN T., and KIM S. Extracting structural information from bug
reports. Proceedings of the 2008 international working conference on mining software repositories, New York,
, pp. 27-30. https://doi.org/10.1145/1370750.1370757
NGUYEN A. T., NGUYEN T. T., AL-KOFAHI J., NGUYEN H. V., and NGUYEN T. N. A topic-based
approach for narrowing the search space of buggy files from a bug report, Proceedings of 26th IEEE/ACM International Conference on Automated Software Engineering, Lawrence, 2011, pp. 263-272.
https://doi.org/10.1109/ASE.2011.6100062
SHI Z., KEUNG J., BENNIN K. E., and ZHANG X. Comparing learning to rank techniques in hybrid bug
localization. Applied Soft Computing, 2018, 62: 636-648. https://doi.org/10.1016/j.asoc.2017.10.048
MILLS C., PANTIUCHINA J., PARRA E., BAVOTA G., and HAIDUC S. Are bug reports enough for text retrievalbased bug localization? Proceedings of IEEE International Conference on Software Maintenance and Evolution, Madrid, 2018, pp. 381-392. https://doi.org/10.1109/ICSME.2018.00046
HERBOLD S., TRAUTSCH A., and LEDEL B. Largescale manual validation of bugfixing changes. Proceedings
of the 17th International Conference on Mining Software Repositories, Pittsburgh, 2020, pp. 611-614. https://doi.org/10.1145/3379597.3387504
PINGCLASAI N., HATA H., and MATSUMOTO K.-I. Classifying bug reports to bugs and other requests using topic modeling. Proceedings of 20th Asia-pacific software engineering conference, Bangkok, 2013, pp. 13-18. https://doi.org/10.1109/APSEC.2013.105
GU Y., XUAN J., ZHANG H., ZHANG L., FAN Q., XIE X., and QIAN T. Does the fault reside in a stack trace? assisting crash localization by predicting crashing fault residence. Journal of Systems and Software, 2019, 148: 88-104. https://doi.org/10.1016/j.jss.2018.11.004
WANG Y., HUANG Z., FANG B., and LI Y. Spectrumbased fault localization via enlarging non-fault region to improve fault absolute ranking. IEEE Access, 2018, 6: 8925-8933. https://doi.org/10.1109/ACCESS.2018.2796849
KIM J., PARK J., and LEE E. A new hybrid algorithm for software fault localization. Proceedings of the 9th International Conference on Ubiquitous Information Management and Communication, New York, 2015, pp. 1-8. https://doi.org/10.1145/2701126.2701207
HASSELBRING W., CARR L., HETTRICK S., PACKER H., and TIROPANIS T. Open source research software. Computer, 2020, 53(8): 84-88. https://doi.ieeecomputersociety.org/10.1109/MC.2020.2998235
SINGH S., & SINGH S. K. A novel approach for bug localization for Exception Handling and Multithreading through mutation. Proceedings of Annual IEEE India Conference, New Delhi, 2016, pp. 1-6 https://doi.org/10.1109/INDICON.2015.7443160
KILINÇ D., YÜCALAR F., BORANDAĞ E., and ASLAN E. Multi‐level reranking approach for bug localization. Expert Systems, 2016, 33(3): 286-294. https://doi.org/10.1111/exsy.12150
ALAZZAWI A. K., RAIS H. M., BASRI S., ALSARIERA Y. A., CAPRETZ L. F., BALOGUN A. O., and IMAM A. A. HABCSm: A Hamming Based t-way Strategy based on Hybrid Artificial Bee Colony for Variable Strength Test Sets Generation. International Journal of Computers Communications & Control, 2021, 16(5). https://doi.org/10.48550/arXiv.2110.03728
AMEEN A. O., MOJEED H., BOLARIWA A., BALOGUN A., MABAYOJE M., USMAN-HAMZAH F., ABDULRAHEEM M., and MOJEED H. A. Application of Shuffled Frog-Leaping Algorithm for Optimal Software Project Scheduling and Staffing. Proceedings of International Conference of Reliable Information and Communication Technology, Langkawi, 2020, pp. 293-303. https://doi.org/10.1007/978-3-030-70713-2_28
VALDIVIA-GARCIA H., SHIHAB E., and NAGAPPAN M. Characterizing and predicting blocking bugs in open source projects. Journal of Systems and Software, 2018, 143: 44-58. https://doi.org/10.1016/j.jss.2018.03.053
LUCIA L., LO D., JIANG L. THUNG F., and BUDI A. Extended comprehensive study of association measures for fault localization. Journal of Software: Evolution and Process, 2014, 26(2): 172-219. https://doi.org/10.1002/smr.1616
ZHOU J., ZHANG H., and LO D. Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. Proceedings of the 34th International Conference on Software Engineering, Zurich, 2012, pp. 14-24. https://doi.org/10.1109/ICSE.2012.6227210
ANVIK J., HIEW L., and MURPHY G. C. Who should fix this bug. Proceedings of the 28th international conference on Software engineering, Pittsburgh, 2006, pp. 361-370. https://doi.org/10.1145/1134285.1134336
MILLS C., PARRA E., PANTIUCHINA J., BAVOTA G., and HAIDUC S. On the relationship between bug reports and queries for text retrieval-based bug localization. Empirical Software Engineering, 2020, 25(5): 3086-3127. https://doi.org/10.1007/s10664-020-09823-w
BALOGUN A. O., BASRI S., MAHAMAD S., ABDULKADIR S. J., ALMOMANI M. A., ADEYEMO V. E., AL-TASHI Q., MOJEED H. A., IMAM A. A., and BAJEH A. O. Impact of feature selection methods on the predictive performance of software defect prediction models: An extensive empirical study. Symmetry, 2020, 12(7): 1147. https://doi.org/10.3390/sym12071147
WU R., WEN M., CHEUNG S.-C., and ZHANG H. Changelocator: locate crash-inducing changes based on crash reports. Empirical Software Engineering, 2018, 23(5): 2866-2900. https://doi.org/10.1007/s10664-017-9567-4
ZHANG X., YAO Y., WANG Y., XU F., and LU J. Exploring metadata in bug reports for bug localization. Proceedings of 24th Asia-Pacific Software Engineering Conference, Nanjing, 2017, pp. 328-337. https://doi.org/10.1109/APSEC.2017.39
LE T.-D. B., THUNG F., and LO D. Will this localization tool be effective for this bug? Mitigating the impact of unreliability of information retrieval based bug localization tools. Empirical Software Engineering, 2017, 22(4): 2237-2279. https://doi.org/10.1007/s10664-016-9484-y
WANG Y., YAO Y., TONG H., HUO X., LI M., XU F., and LU J. Bug localization via supervised topic modeling. Proceedings of IEEE international conference on data mining, 2018, pp. 607-616. http://tonghanghang.org/pdfs/icdm2018_bug.pdf
AKBAR S. A., & KAK A. C. A large-scale comparative evaluation of IR-based tools for bug localization. Proceedings of the 17th International Conference on Mining Software Repositories, Seoul, 2020, pp. 21-31. https://doi.org/10.1145/3379597.3387474
FANG F., WU J., LI Y., YE X., ALJEDAANI W., and MKAOUER M. W. On the classification of bug reports to improve bug localization. Soft Computing, 2021, 25(11): 7307-7323. https://doi.org/10.1007/s00500-021-05689-2
MURALI V., GROSS L., QIAN R., and CHANDRA S. Industry-scale IR-based Bug Localization: A Perspective from Facebook. Proceedings of IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice, Madrid, 2021, pp. 188-197. https://doi.org/10.48550/arXiv.2010.09977
MORENO L., TREADWAY J. J., MARCUS A., and SHEN W. On the use of stack traces to improve text retrieval-based bug localization. Proceedings of IEEE International Conference on Software Maintenance and Evolution, Victoria, 2014, pp. 151-160. https://doi.org/10.1109/ICSME.2014.37
RAO S., & KAK A. Retrieval from software libraries for bug localization: a comparative study of generic and composite text models. Proceedings of the 8th Working Conference on Mining Software Repositories, Pittsburgh, 2011, pp. 43-52. https://doi.org/10.1145/1985441.1985451
RAHMAN S., GANGULY K. K., and SAKIB K. An improved bug localization using structured information retrieval and version history, Proceedings of 18th International Conference on Computer and Information Technology, Dhaka, 2015, pp. 190-195. https://doi.org/10.1109/ICCITechn.2015.7488066
ZHAI C., COHEN W. W., and LAFFERTY J. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. ACM SIGIR Forum, 2015, 49(1): 2-9. https://doi.org/10.1145/2795403.2795405
WONG C.-P., XIONG Y., ZHANG H., HAO D., ZHANG L., and MEI H. Boosting bug-report-oriented fault
localization with segmentation and stack-trace analysis. Proceedings of IEEE International Conference on Software Maintenance and Evolution, Victoria, 2014, pp. 181-190. https://doi.org/10.1109/ICSME.2014.40
YOUM K. C., AHN J., and LEE E. Improved bug localization based on code change histories and bug reports. Information and Software Technology, 2017, 82: 177-192. https://doi.org/10.1016/j.infsof.2016.11.002
DAO T., ZHANG L., and MENG N. How does execution information help with information-retrieval based bug localization. Proceedings of IEEE/ACM 25th International Conference on Program Comprehension, Buenos Aires, 2017, pp. 241-250. https://doi.org/10.1109/ICPC.2017.29
THUNG F. Automatic prediction of bug fixing effort measured by code churn size. Proceedings of the 5th International Workshop on Software Mining, New York, 2016, pp. 18-23. https://doi.org/10.1145/2975961.2975964
SAHA R. K., LEASE M., KHURSHID S., and PERRY D. E. Improving bug localization using structured information retrieval. Proceedings of 28th IEEE/ACM International Conference on Automated Software Engineering, Silicon Valley CA, 2013, pp. 345-355. https://doi.org/10.1109/ASE.2013.6693093
POSHYVANYK D., GUÉHÉNEUC Y.-G., MARCUS A., ANTONIOL G., and RAJLICH V. Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Transactions on Software Engineering, 2007, 33(6): pp. 420-432. https://doi.org/10.1109/TSE.2007.1016
LUKINS S. K., KRAFT N. A., and ETZKORN L. H. Bug localization using latent dirichlet allocation. Information and Software Technology, 2010, 52(9): 972-990. https://doi.org/10.1016/j.infsof.2010.04.002
JELODAR H., WANG Y., YUAN C., FENG X., JIANG X., LI Y., and ZHAO L. Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimedia Tools and Applications, 2019, 78(11): 15169-15211. https://doi.org/10.48550/arXiv.1711.04305
ARUN R., SURESH V., MADHAVAN C. V., and MURTHY M. N. On finding the natural number of topics with latent dirichlet allocation: Some observations. Proceedings of Pacific-Asia conference on knowledge discovery and data mining, Berlin, 2010, pp. 391-402. https://doi.org/10.1007/978-3-642-13657-3_43
YE X., SHEN H., MA X., BUNESCU R., and LIU C. From word embeddings to document similarities for improved information retrieval in software engineering. Proceedings of the 38th international conference on software engineering, Austin, 2016, pp. 404-415. https://doi.org/10.1145/2884781.2884862
TAKAHASHI A., SAE-LIM N., HAYASHI S., and SAEKI M. A preliminary study on using code smells to improve bug localization. Proceedings of the 26th Conference on Program Comprehension, Pittsburgh, 2018, pp. 324-327. https://doi.org/10.1145/3196321.3196361
SANGLE S., MUVVA S., CHIMALAKONDA S., PONNALAGU K., and VENKOPARAO V. G. DRAST--A Deep Learning and AST Based Approach for Bug Localization. arXiv preprint, 2020: 2011.03449. https://doi.org/10.48550/arXiv.2011.03449
MAHAJAN G., & CHAUDHARY N. Improving Bug Localization using IR-based Textual Similarity and Vectorization Scoring Framework. International Journal of Advances in Soft Computing & Its Applications, 2020, 12(2): 23-32. https://www.semanticscholar.org/paper/Improving-Bug-Localization-using-IR-based-Textual-Mahajan-Chaudhary/c7e5e583304d5b9694af2a15a38cddf4db68fedf
QIU F., YAN M., XIA X., WANG X., FAN Y., HASSAN A. E., and LO D. JITO: a tool for just-in-time defect identification and localization. Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Singapore, 2020, pp. 1586-1590. https://ink.library.smu.edu.sg/sis_research/5537
CHENG S., YAN X., and KHAN A. A. A Similarity Integration Method based Information Retrieval and Word Embedding in Bug Localization. Proceedings of 20th International Conference on Software Quality, Reliability and Security, Macau, 2020, pp. 180-187. https://doi.org/10.1109/QRS51102.2020.00034
XIAO Y., KEUNG J., MI Q., and BENNIN K. E. Bug localization with semantic and structural features using convolutional neural network and cascade forest. Proceedings of the 22nd International Conference on Evaluation and Assessment in Software Engineering, New York, 2018, pp. 101-111. https://doi.org/10.1145/3210459.3210469
TANTITHAMTHAVORN C., ABEBE S. L., HASSAN A. E., IHARA A., and MATSUMOTO K. The impact of IR-based classifier configuration on the performance and the effort of method-level bug localization. Information and Software Technology, 2018, 102: 160-174. https://doi.org/10.1016/j.infsof.2018.06.001
LAM A. N., NGUYEN A. T., NGUYEN H. A., and NGUYEN T. N. Bug localization with combination of deep learning and information retrieval. Proceedings of IEEE/ACM 25th International Conference on Program Comprehension, Buenos Aires, 2017, pp. 218-229. https://doi.org/10.1109/ICPC.2017.24
LOYOLA P., GAJANANAN K., and SATOH F. Bug localization by learning to rank and represent bug inducing changes. Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Torino, 2018, pp. 657-665. https://doi.org/10.1145/3269206.3271811
Refbacks
- There are currently no refbacks.