TY - GEN
T1 - Learning Models for Software Feature Extraction from Disaster Tweets
T2 - 4th International Conference on Electrical Engineering and Computer Science, ICECOS 2024
AU - Lovitasari Yonia, Dwika
AU - Irham, Ainal
AU - Oranova Siahaan, Daniel
AU - Mahfud, Ilyas
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - This study assesses the effectiveness of different combinations of word embeddings and machine learning models in classifying disaster-related tweets to improve emergency response efforts. The primary objective is to identify the most reliable approach for categorizing tweets as either disaster-related or unrelated. The research employs TF-IDF for feature vectorization and Word2Vec for word embedding, combined with machine learning models such as Logistic Regression, Support Vector Machines, K-Nearest Neighbors, Decision Trees, and Random Forest. After extensive preprocessing, the results indicate that Word2Vec combined with Logistic Regression achieved the highest performance, with 81% accuracy, precision, and recall, and an F1 score of 80%. Other combinations, including Word2Vec with Support Vector Machines and Random Forest, also demonstrated strong results. In contrast, TF-IDF combined with K-Nearest Neighbors resulted in the lowest accuracy at 68%. These findings highlight the critical importance of selecting the appropriate word embedding techniques and machine learning models for effective text classification. Future research should explore more advanced embeddings like BERT and Transformer, while also incorporating temporal and semantic analysis to further enhance classification accuracy and robustness.
AB - This study assesses the effectiveness of different combinations of word embeddings and machine learning models in classifying disaster-related tweets to improve emergency response efforts. The primary objective is to identify the most reliable approach for categorizing tweets as either disaster-related or unrelated. The research employs TF-IDF for feature vectorization and Word2Vec for word embedding, combined with machine learning models such as Logistic Regression, Support Vector Machines, K-Nearest Neighbors, Decision Trees, and Random Forest. After extensive preprocessing, the results indicate that Word2Vec combined with Logistic Regression achieved the highest performance, with 81% accuracy, precision, and recall, and an F1 score of 80%. Other combinations, including Word2Vec with Support Vector Machines and Random Forest, also demonstrated strong results. In contrast, TF-IDF combined with K-Nearest Neighbors resulted in the lowest accuracy at 68%. These findings highlight the critical importance of selecting the appropriate word embedding techniques and machine learning models for effective text classification. Future research should explore more advanced embeddings like BERT and Transformer, while also incorporating temporal and semantic analysis to further enhance classification accuracy and robustness.
KW - comparative analysis
KW - disaster tweets
KW - feature extraction
KW - machine learning
KW - word embedding
UR - https://www.scopus.com/pages/publications/85215300944
U2 - 10.1109/ICECOS63900.2024.10791279
DO - 10.1109/ICECOS63900.2024.10791279
M3 - Conference contribution
AN - SCOPUS:85215300944
T3 - ICECOS 2024 - 4th International Conference on Electrical Engineering and Computer Science, Proceeding
SP - 83
EP - 88
BT - ICECOS 2024 - 4th International Conference on Electrical Engineering and Computer Science, Proceeding
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 25 September 2024 through 26 September 2024
ER -