Abstract

For many years, the Burrows-Wheeler Transform (BWT) had been employed in data compression. This BWT-based compression is facing inflexibility problems due to their text-dependent. To deal with this problem, we took the opportunity to combine BWT with Hidden Markov Model (HMM) as a compression system. BWT employed to produce a clustered single character structure, meanwhile, HMM employed to predict the Genomic Data through the cluster. Here we performed a learning algorithm (the Baum-Welch EM Algorithm) to improve the compression ratio by re-estimating the model to the Genomic Data. The highest single and mean compression ratio produced is 4.276 and 4.004 respectively, with the possibility of improved compression ratio as much as 2.90% before saturation. Furthermore, this compression system still interesting to be developed on these topics, i.e. developing the HMM to cope with complex patterns and performing offline re-estimation to reduce time consumption.

Original languageEnglish
Title of host publicationCENIM 2020 - Proceeding
Subtitle of host publicationInternational Conference on Computer Engineering, Network, and Intelligent Multimedia 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages429-434
Number of pages6
ISBN (Electronic)9781728182834
DOIs
Publication statusPublished - 17 Nov 2020
Event2020 International Conference on Computer Engineering, Network, and Intelligent Multimedia, CENIM 2020 - Virtual, Surabaya, Indonesia
Duration: 17 Nov 202018 Nov 2020

Publication series

NameCENIM 2020 - Proceeding: International Conference on Computer Engineering, Network, and Intelligent Multimedia 2020

Conference

Conference2020 International Conference on Computer Engineering, Network, and Intelligent Multimedia, CENIM 2020
Country/TerritoryIndonesia
CityVirtual, Surabaya
Period17/11/2018/11/20

Keywords

  • Burrows Wheeler Transform Hidden Markov Model
  • Genomic Data Management
  • Lossless Compression

Fingerprint

Dive into the research topics of 'An Adaptive BWT-HMM-based Lossless Compression System for Genomic Data'. Together they form a unique fingerprint.

Cite this