Abstract

Generating minor class data of spam texts is expected to solve the imbalanced problem in spam detection of product reviews. There could be semantic differences between the generated texts and the original ones. Thus, by including the semantically differed texts in the spam dataset used for training is like a noise addition. For evaluating the generated texts, some manual preparations of ground-truth data are necessary. This work has evaluated the generated texts with some approaches to ensure their context and sequence similarities compared to the original texts for better performance of a spam detection. The employed approaches are expected to eliminate the manual tasks. Our research proposes an evaluation model that consists of word-embedding pre-trained and LSTM Siamese to evaluate text generation in imbalance review. The use of a combination of pre-trained word embedding and LSTM Siamese trained model will capture the semantic aspect of the text.

Original languageEnglish
Title of host publication2021 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781665442640
DOIs
Publication statusPublished - 2021
Event13th International Conference on Advanced Computer Science and Information Systems, ICACSIS 2021 - Depok, Indonesia
Duration: 23 Oct 202126 Oct 2021

Publication series

Name2021 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2021

Conference

Conference13th International Conference on Advanced Computer Science and Information Systems, ICACSIS 2021
Country/TerritoryIndonesia
CityDepok
Period23/10/2126/10/21

Keywords

  • imbalance review
  • pre-trained model
  • spam detection
  • text generation
  • word embedding

Fingerprint

Dive into the research topics of 'Word-Embedding Model for Evaluating Text Generation of Imbalanced Spam Reviews'. Together they form a unique fingerprint.

Cite this