TY - GEN
T1 - Enhancing the Security of Word Embedding in Machine Learning as a Service against Reverse Engineering Attacks using Homomorphic Encryption
AU - Muliantara, Agus
AU - Purwitasari, Diana
AU - Pratomo, Baskoro Adi
AU - Studiawan, Hudan
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s).
PY - 2025/7/1
Y1 - 2025/7/1
N2 - Word Embedding is important in Natural language Processing (NLP). It offers contextual representations of corpus that used by sentiment analysis or text classification. Even though the representation is in form of numerical they are still vulnerable to reconstruction attacks, such as INVBERT, which can reverse the original text from those numerical embeddings which posing privacy risks. This research analyzed the use of Homomorphic Encryption (HE) to secure embeddings by keeping them encrypted during computations, preserving confidentiality without decryption. Financial text data which categorized into positive, neutral, and negative sentiments, was used to generate word embeddings with 50-dimensional pre-trained GloVe vectors. Standardized input lengths were created using padding sizes of 15, 25, and 50, and an Artificial Neural Network (ANN) was applied for sentiment classification. The study analyzed the impact of HE on memory usage, execution time, and prediction accuracy. The results show that HE effectively prevents reconstruction attacks, securing sensitive data by scrambling word embedding data to make it unreadable. But followed by the rise of memory usage and execution time, especially with larger padding sizes. Prediction accuracy consistency between plaintext and ciphertext outputs was 66% (118 of 180) indicates the need for parameter adjustment. More multiplications in ANN cause problems in the maximum value of polynomial scale calculations. Nevertheless, HE shows potential for secure NLP applications, which can balance between privacy and computational efficiency. Furthermore, optimization and hybrid methodologies may be possible to improve the effectiveness of HE in protecting confidential information in NLP tasks.
AB - Word Embedding is important in Natural language Processing (NLP). It offers contextual representations of corpus that used by sentiment analysis or text classification. Even though the representation is in form of numerical they are still vulnerable to reconstruction attacks, such as INVBERT, which can reverse the original text from those numerical embeddings which posing privacy risks. This research analyzed the use of Homomorphic Encryption (HE) to secure embeddings by keeping them encrypted during computations, preserving confidentiality without decryption. Financial text data which categorized into positive, neutral, and negative sentiments, was used to generate word embeddings with 50-dimensional pre-trained GloVe vectors. Standardized input lengths were created using padding sizes of 15, 25, and 50, and an Artificial Neural Network (ANN) was applied for sentiment classification. The study analyzed the impact of HE on memory usage, execution time, and prediction accuracy. The results show that HE effectively prevents reconstruction attacks, securing sensitive data by scrambling word embedding data to make it unreadable. But followed by the rise of memory usage and execution time, especially with larger padding sizes. Prediction accuracy consistency between plaintext and ciphertext outputs was 66% (118 of 180) indicates the need for parameter adjustment. More multiplications in ANN cause problems in the maximum value of polynomial scale calculations. Nevertheless, HE shows potential for secure NLP applications, which can balance between privacy and computational efficiency. Furthermore, optimization and hybrid methodologies may be possible to improve the effectiveness of HE in protecting confidential information in NLP tasks.
KW - Data Privacy
KW - Homomorphic Encryption
KW - INVBERT
KW - Machine Learning
KW - Word Embeddings
UR - https://www.scopus.com/pages/publications/105012245973
U2 - 10.1145/3729706.3729712
DO - 10.1145/3729706.3729712
M3 - Conference contribution
AN - SCOPUS:105012245973
T3 - Proceedings of 2025 4th International Conference on Cyber Security, Artificial Intelligence and the Digital Economy, CSAIDE 2025
SP - 31
EP - 36
BT - Proceedings of 2025 4th International Conference on Cyber Security, Artificial Intelligence and the Digital Economy, CSAIDE 2025
PB - Association for Computing Machinery, Inc
T2 - 2025 4th International Conference on Cyber Security, Artificial Intelligence and the Digital Economy, CSAIDE 2025
Y2 - 7 March 2025 through 9 March 2025
ER -