An Enhanced Information Retrieval-Based Bug Localization System with Code Coverage, Stack Traces, and Spectrum Information

Shakirat Aderonke Salihu, Oluwakemi Christiana Abikoye


Several strategies such as Vector Space Model (VSM), revised Vector Space Model (rVSM), and integration of additional elements such as stack trace and previously corrected bug report have been utilized to improve the Information Retrieval (IR) based bug localization process. Most of the existing IR-based approaches make use of source code files without filtering, which eventually increases the search space of the technique, thereby slowing down the bug localization process. This study developed an enhanced IR-based bug localization model as a viable solution. Specifically, an enhanced rVSM (e-rVSM) is developed based on the hybridization of code coverage, stack traces, and spectrum information. Combining the stack trace and spectrum information as additional features can enhance the accuracy of the IR-based technique by boosting the bug localization process. Code coverage analysis was conducted to remove irrelevant source files and reduce the search space of the IR technique. Then the filtered source files are preprocessed via tokenization and stemming from selecting relevant features and removing unwanted words. The preprocessed data is further analyzed by finding similarities between the preprocessed bug reports and source code files using the e-rVSM. Finally, scores for each source code and suspected buggy files are ranked in descending order. The performance of the proposed e-rVSM is tested on two open-source projects (Zxing and SWT), and its effectiveness is assessed using TopN rank (where N = 5, 10), Mean Reciprocal Rank (MRR), and Mean Average Precision (MAP). Findings from the experimental results revealed the effectiveness of e-rVSM in bug localization. In particular, e-rVSM recorded a significant Top 5 (80.2%; 65%) and Top 10 (89.1%; 75%) rank values on SWT and Zxing dataset respectively. Also, the proposed e-rVSM had MRR values of 80% and 54% on the SWT dataset and MAP values of 61.22% and 47.23% on the Zxing dataset.

Keywords: information retrieval, bug localization, vector space model.


Full Text:



BALOGUN A. O., BASRI S., ABDULKADIR S. J. and HASHIM A. S. Performance analysis of feature selection

methods in software defect prediction: a search method approach. Applied Sciences, 2019, 9(13): 2764.

XIAO Y., KEUNG J., BENNIN K. E., and MI Q. Improving bug localization with word embedding and

enhanced convolutional neural networks. Information and Software Technology, 2019, 105: 17-29


ADEYEMO V. E., and KUMAR G. A Novel Rank Aggregation-Based Hybrid Multifilter Wrapper Feature

Selection Method in Software Defect Prediction. Computational Intelligence and Neuroscience, 2021, 2021:

BETTENBURG N., PREMRAJ R., ZIMMERMANN T., and KIM S. Extracting structural information from bug

reports. Proceedings of the 2008 international working conference on mining software repositories, New York,

, pp. 27-30.

NGUYEN A. T., NGUYEN T. T., AL-KOFAHI J., NGUYEN H. V., and NGUYEN T. N. A topic-based

approach for narrowing the search space of buggy files from a bug report, Proceedings of 26th IEEE/ACM International Conference on Automated Software Engineering, Lawrence, 2011, pp. 263-272.

SHI Z., KEUNG J., BENNIN K. E., and ZHANG X. Comparing learning to rank techniques in hybrid bug

localization. Applied Soft Computing, 2018, 62: 636-648.

MILLS C., PANTIUCHINA J., PARRA E., BAVOTA G., and HAIDUC S. Are bug reports enough for text retrievalbased bug localization? Proceedings of IEEE International Conference on Software Maintenance and Evolution, Madrid, 2018, pp. 381-392.

HERBOLD S., TRAUTSCH A., and LEDEL B. Largescale manual validation of bugfixing changes. Proceedings

of the 17th International Conference on Mining Software Repositories, Pittsburgh, 2020, pp. 611-614.

PINGCLASAI N., HATA H., and MATSUMOTO K.-I. Classifying bug reports to bugs and other requests using topic modeling. Proceedings of 20th Asia-pacific software engineering conference, Bangkok, 2013, pp. 13-18.

GU Y., XUAN J., ZHANG H., ZHANG L., FAN Q., XIE X., and QIAN T. Does the fault reside in a stack trace? assisting crash localization by predicting crashing fault residence. Journal of Systems and Software, 2019, 148: 88-104.

WANG Y., HUANG Z., FANG B., and LI Y. Spectrumbased fault localization via enlarging non-fault region to improve fault absolute ranking. IEEE Access, 2018, 6: 8925-8933.

KIM J., PARK J., and LEE E. A new hybrid algorithm for software fault localization. Proceedings of the 9th International Conference on Ubiquitous Information Management and Communication, New York, 2015, pp. 1-8.

HASSELBRING W., CARR L., HETTRICK S., PACKER H., and TIROPANIS T. Open source research software. Computer, 2020, 53(8): 84-88.

SINGH S., & SINGH S. K. A novel approach for bug localization for Exception Handling and Multithreading through mutation. Proceedings of Annual IEEE India Conference, New Delhi, 2016, pp. 1-6

KILINÇ D., YÜCALAR F., BORANDAĞ E., and ASLAN E. Multi‐level reranking approach for bug localization. Expert Systems, 2016, 33(3): 286-294.

ALAZZAWI A. K., RAIS H. M., BASRI S., ALSARIERA Y. A., CAPRETZ L. F., BALOGUN A. O., and IMAM A. A. HABCSm: A Hamming Based t-way Strategy based on Hybrid Artificial Bee Colony for Variable Strength Test Sets Generation. International Journal of Computers Communications & Control, 2021, 16(5).

AMEEN A. O., MOJEED H., BOLARIWA A., BALOGUN A., MABAYOJE M., USMAN-HAMZAH F., ABDULRAHEEM M., and MOJEED H. A. Application of Shuffled Frog-Leaping Algorithm for Optimal Software Project Scheduling and Staffing. Proceedings of International Conference of Reliable Information and Communication Technology, Langkawi, 2020, pp. 293-303.

VALDIVIA-GARCIA H., SHIHAB E., and NAGAPPAN M. Characterizing and predicting blocking bugs in open source projects. Journal of Systems and Software, 2018, 143: 44-58.

LUCIA L., LO D., JIANG L. THUNG F., and BUDI A. Extended comprehensive study of association measures for fault localization. Journal of Software: Evolution and Process, 2014, 26(2): 172-219.

ZHOU J., ZHANG H., and LO D. Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. Proceedings of the 34th International Conference on Software Engineering, Zurich, 2012, pp. 14-24.

ANVIK J., HIEW L., and MURPHY G. C. Who should fix this bug. Proceedings of the 28th international conference on Software engineering, Pittsburgh, 2006, pp. 361-370.

MILLS C., PARRA E., PANTIUCHINA J., BAVOTA G., and HAIDUC S. On the relationship between bug reports and queries for text retrieval-based bug localization. Empirical Software Engineering, 2020, 25(5): 3086-3127.

BALOGUN A. O., BASRI S., MAHAMAD S., ABDULKADIR S. J., ALMOMANI M. A., ADEYEMO V. E., AL-TASHI Q., MOJEED H. A., IMAM A. A., and BAJEH A. O. Impact of feature selection methods on the predictive performance of software defect prediction models: An extensive empirical study. Symmetry, 2020, 12(7): 1147.

WU R., WEN M., CHEUNG S.-C., and ZHANG H. Changelocator: locate crash-inducing changes based on crash reports. Empirical Software Engineering, 2018, 23(5): 2866-2900.

ZHANG X., YAO Y., WANG Y., XU F., and LU J. Exploring metadata in bug reports for bug localization. Proceedings of 24th Asia-Pacific Software Engineering Conference, Nanjing, 2017, pp. 328-337.

LE T.-D. B., THUNG F., and LO D. Will this localization tool be effective for this bug? Mitigating the impact of unreliability of information retrieval based bug localization tools. Empirical Software Engineering, 2017, 22(4): 2237-2279.

WANG Y., YAO Y., TONG H., HUO X., LI M., XU F., and LU J. Bug localization via supervised topic modeling. Proceedings of IEEE international conference on data mining, 2018, pp. 607-616.

AKBAR S. A., & KAK A. C. A large-scale comparative evaluation of IR-based tools for bug localization. Proceedings of the 17th International Conference on Mining Software Repositories, Seoul, 2020, pp. 21-31.

FANG F., WU J., LI Y., YE X., ALJEDAANI W., and MKAOUER M. W. On the classification of bug reports to improve bug localization. Soft Computing, 2021, 25(11): 7307-7323.

MURALI V., GROSS L., QIAN R., and CHANDRA S. Industry-scale IR-based Bug Localization: A Perspective from Facebook. Proceedings of IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice, Madrid, 2021, pp. 188-197.

MORENO L., TREADWAY J. J., MARCUS A., and SHEN W. On the use of stack traces to improve text retrieval-based bug localization. Proceedings of IEEE International Conference on Software Maintenance and Evolution, Victoria, 2014, pp. 151-160.

RAO S., & KAK A. Retrieval from software libraries for bug localization: a comparative study of generic and composite text models. Proceedings of the 8th Working Conference on Mining Software Repositories, Pittsburgh, 2011, pp. 43-52.

RAHMAN S., GANGULY K. K., and SAKIB K. An improved bug localization using structured information retrieval and version history, Proceedings of 18th International Conference on Computer and Information Technology, Dhaka, 2015, pp. 190-195.

ZHAI C., COHEN W. W., and LAFFERTY J. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. ACM SIGIR Forum, 2015, 49(1): 2-9.

WONG C.-P., XIONG Y., ZHANG H., HAO D., ZHANG L., and MEI H. Boosting bug-report-oriented fault

localization with segmentation and stack-trace analysis. Proceedings of IEEE International Conference on Software Maintenance and Evolution, Victoria, 2014, pp. 181-190.

YOUM K. C., AHN J., and LEE E. Improved bug localization based on code change histories and bug reports. Information and Software Technology, 2017, 82: 177-192.

DAO T., ZHANG L., and MENG N. How does execution information help with information-retrieval based bug localization. Proceedings of IEEE/ACM 25th International Conference on Program Comprehension, Buenos Aires, 2017, pp. 241-250.

THUNG F. Automatic prediction of bug fixing effort measured by code churn size. Proceedings of the 5th International Workshop on Software Mining, New York, 2016, pp. 18-23.

SAHA R. K., LEASE M., KHURSHID S., and PERRY D. E. Improving bug localization using structured information retrieval. Proceedings of 28th IEEE/ACM International Conference on Automated Software Engineering, Silicon Valley CA, 2013, pp. 345-355.

POSHYVANYK D., GUÉHÉNEUC Y.-G., MARCUS A., ANTONIOL G., and RAJLICH V. Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Transactions on Software Engineering, 2007, 33(6): pp. 420-432.

LUKINS S. K., KRAFT N. A., and ETZKORN L. H. Bug localization using latent dirichlet allocation. Information and Software Technology, 2010, 52(9): 972-990.

JELODAR H., WANG Y., YUAN C., FENG X., JIANG X., LI Y., and ZHAO L. Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimedia Tools and Applications, 2019, 78(11): 15169-15211.

ARUN R., SURESH V., MADHAVAN C. V., and MURTHY M. N. On finding the natural number of topics with latent dirichlet allocation: Some observations. Proceedings of Pacific-Asia conference on knowledge discovery and data mining, Berlin, 2010, pp. 391-402.

YE X., SHEN H., MA X., BUNESCU R., and LIU C. From word embeddings to document similarities for improved information retrieval in software engineering. Proceedings of the 38th international conference on software engineering, Austin, 2016, pp. 404-415.

TAKAHASHI A., SAE-LIM N., HAYASHI S., and SAEKI M. A preliminary study on using code smells to improve bug localization. Proceedings of the 26th Conference on Program Comprehension, Pittsburgh, 2018, pp. 324-327.

SANGLE S., MUVVA S., CHIMALAKONDA S., PONNALAGU K., and VENKOPARAO V. G. DRAST--A Deep Learning and AST Based Approach for Bug Localization. arXiv preprint, 2020: 2011.03449.

MAHAJAN G., & CHAUDHARY N. Improving Bug Localization using IR-based Textual Similarity and Vectorization Scoring Framework. International Journal of Advances in Soft Computing & Its Applications, 2020, 12(2): 23-32.

QIU F., YAN M., XIA X., WANG X., FAN Y., HASSAN A. E., and LO D. JITO: a tool for just-in-time defect identification and localization. Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Singapore, 2020, pp. 1586-1590.

CHENG S., YAN X., and KHAN A. A. A Similarity Integration Method based Information Retrieval and Word Embedding in Bug Localization. Proceedings of 20th International Conference on Software Quality, Reliability and Security, Macau, 2020, pp. 180-187.

XIAO Y., KEUNG J., MI Q., and BENNIN K. E. Bug localization with semantic and structural features using convolutional neural network and cascade forest. Proceedings of the 22nd International Conference on Evaluation and Assessment in Software Engineering, New York, 2018, pp. 101-111.

TANTITHAMTHAVORN C., ABEBE S. L., HASSAN A. E., IHARA A., and MATSUMOTO K. The impact of IR-based classifier configuration on the performance and the effort of method-level bug localization. Information and Software Technology, 2018, 102: 160-174.

LAM A. N., NGUYEN A. T., NGUYEN H. A., and NGUYEN T. N. Bug localization with combination of deep learning and information retrieval. Proceedings of IEEE/ACM 25th International Conference on Program Comprehension, Buenos Aires, 2017, pp. 218-229.

LOYOLA P., GAJANANAN K., and SATOH F. Bug localization by learning to rank and represent bug inducing changes. Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Torino, 2018, pp. 657-665.


  • There are currently no refbacks.