Source code retrieval on StackOverflow using LDA

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

16 Citations (Scopus)

Abstract

Internet code search is quite popular research area. StackOverflow allows developers to ask and answer questions about code. Previous approach to search code on StackOverflow uses tf-idf method that based on number of occurrences of words to recommend source code. This method has the disadvantage that variable or method identifiers are considered as normal words, even though identifiers are often a combination of two or more words. For example, there is an identifier named 'randomString'. In that case, if we search using a keyword 'random' the system probably will not recommend 'randomString' because both words are different. Concept location can tackle this problem. Concept location has been used widely to obtain the correlation between code with a specific concepts or features. Previous research of concept location only focused on source code's comments, and relation among the objects within the source code. This research proposes a mechanism for finding code on StackOverflow uses Latent Dirichlet Allocation (LDA) using concept location in the preprocessing stage. Questions, answers, and code snippets about Java programming are downloaded from StackOverflow to a local repository. Corpuses are generated by extracting questions, answers and code snippets. Inferencing concept location from source code is created using LDA algorithm. Developers query concepts and then system will recommend source code based on the relevant concepts. The result of the experiment shows that the system is able to recommend source code with 48% average of precision and 58% average of recall.

Original languageEnglish
Title of host publication2015 3rd International Conference on Information and Communication Technology, ICoICT 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages295-299
Number of pages5
ISBN (Electronic)9781479977529
DOIs
Publication statusPublished - 31 Aug 2015
Event3rd International Conference on Information and Communication Technology, ICoICT 2015 - Bali, Indonesia
Duration: 27 May 201529 May 2015

Publication series

Name2015 3rd International Conference on Information and Communication Technology, ICoICT 2015

Conference

Conference3rd International Conference on Information and Communication Technology, ICoICT 2015
Country/TerritoryIndonesia
CityBali
Period27/05/1529/05/15

Keywords

  • Concept Location
  • Latent Dirichlet Allocation
  • Source Code Searching

Fingerprint

Dive into the research topics of 'Source code retrieval on StackOverflow using LDA'. Together they form a unique fingerprint.

Cite this