Abstract
Plagiarism is increasingly alarming, especially if this happens in the field of education. Many writing works in which a part of the content is written by plagiarizing other people's works. Similar sentence detection as a plagiarism indicator can be conducted by using n-gram based hashing algorithm of Winnowing algorithm. The function of Winnowing is to generate document fingerprint which convert texts within document into a collection of hash values. Similar fingerprint between documents shows that there are similar texts as a plagiarism indicator. Plagiarizing usually happens on documents having similar topics. Therefore, to detect plagiarism, documents having similar topics should be clustered. K-means-H++ is a clustering algorithm that requires cluster number as its input through recommendation conducted by Hartigan index to give a recommendation for the cluster number. After clustering documents, a comparison was made between document fingerprint and fingerprint cluster instead of between documents. Then, the comparison was made for documents which become members of the closest cluster that had been selected from the first comparison.
| Original language | English |
|---|---|
| Pages (from-to) | 341-347 |
| Number of pages | 7 |
| Journal | Asian Journal of Information Technology |
| Volume | 10 |
| Issue number | 8 |
| DOIs | |
| Publication status | Published - 2011 |
Keywords
- Document fingerprint
- Hartigan index
- Indicator
- K-means++
- Plagiarism detection
- Winnowing
Fingerprint
Dive into the research topics of 'The use of Hartigan index for initializing K-means++ in detecting similar texts of clustered documents as a plagiarism indicator'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver