Towards Better HS Code Prediction: A Comparative Study of Machine Learning and NLP Approaches

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The Harmonized System (HS) code is an important instrument for classifying goods in international trade, as it ensures proper tariffs to be paid along with compliance with customs regulations. However, predicting HS codes is a challenging task, as commodity descriptions are unstructured text that need to be mapped to hierarchical commodity categories which often different between common trade terms and HS nomenclature. This study addresses these challenges by evaluating various machine learning models, including traditional, deep learning, and NLP-based approaches, on datasets characterized by short, noisy descriptions. We aim to investigate whether these models maintain their performance with real-world, imperfect data and understand the underlying factors contributing to model inaccuracies. The analysis demonstrates that NLP models, particularly fastText, consistently outperformed the others by delivering the highest accuracy when it came to 8-digit HS code classification. Despite the result, this study also revealed significant misclassification issues because of ambiguous terminology and common practices of importers copying the SKU numbers from invoices or packing list into the import declarations without parsing them into rich commodity descriptions, and also formatting errors.

Original languageEnglish
Title of host publication2025 International Conference on Smart Computing, IoT and Machine Learning, SIML 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798331522780
DOIs
Publication statusPublished - 2025
Event2025 International Conference on Smart Computing, IoT and Machine Learning, SIML 2025 - Hybrid, Surakarta, Indonesia
Duration: 3 Jun 20254 Jun 2025

Publication series

Name2025 International Conference on Smart Computing, IoT and Machine Learning, SIML 2025

Conference

Conference2025 International Conference on Smart Computing, IoT and Machine Learning, SIML 2025
Country/TerritoryIndonesia
CityHybrid, Surakarta
Period3/06/254/06/25

Keywords

  • HS code prediction
  • commodity classification
  • machine learning models
  • natural language processing
  • trade compliance

Fingerprint

Dive into the research topics of 'Towards Better HS Code Prediction: A Comparative Study of Machine Learning and NLP Approaches'. Together they form a unique fingerprint.

Cite this