TY - GEN
T1 - An Adaptive BWT-HMM-based Lossless Compression System for Genomic Data
AU - Sulistyawan, I. Gede Eka
AU - Arifin, Achmad
AU - Fatoni, Muhammad Hilman
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/11/17
Y1 - 2020/11/17
N2 - For many years, the Burrows-Wheeler Transform (BWT) had been employed in data compression. This BWT-based compression is facing inflexibility problems due to their text-dependent. To deal with this problem, we took the opportunity to combine BWT with Hidden Markov Model (HMM) as a compression system. BWT employed to produce a clustered single character structure, meanwhile, HMM employed to predict the Genomic Data through the cluster. Here we performed a learning algorithm (the Baum-Welch EM Algorithm) to improve the compression ratio by re-estimating the model to the Genomic Data. The highest single and mean compression ratio produced is 4.276 and 4.004 respectively, with the possibility of improved compression ratio as much as 2.90% before saturation. Furthermore, this compression system still interesting to be developed on these topics, i.e. developing the HMM to cope with complex patterns and performing offline re-estimation to reduce time consumption.
AB - For many years, the Burrows-Wheeler Transform (BWT) had been employed in data compression. This BWT-based compression is facing inflexibility problems due to their text-dependent. To deal with this problem, we took the opportunity to combine BWT with Hidden Markov Model (HMM) as a compression system. BWT employed to produce a clustered single character structure, meanwhile, HMM employed to predict the Genomic Data through the cluster. Here we performed a learning algorithm (the Baum-Welch EM Algorithm) to improve the compression ratio by re-estimating the model to the Genomic Data. The highest single and mean compression ratio produced is 4.276 and 4.004 respectively, with the possibility of improved compression ratio as much as 2.90% before saturation. Furthermore, this compression system still interesting to be developed on these topics, i.e. developing the HMM to cope with complex patterns and performing offline re-estimation to reduce time consumption.
KW - Burrows Wheeler Transform Hidden Markov Model
KW - Genomic Data Management
KW - Lossless Compression
UR - http://www.scopus.com/inward/record.url?scp=85099640400&partnerID=8YFLogxK
U2 - 10.1109/CENIM51130.2020.9297871
DO - 10.1109/CENIM51130.2020.9297871
M3 - Conference contribution
AN - SCOPUS:85099640400
T3 - CENIM 2020 - Proceeding: International Conference on Computer Engineering, Network, and Intelligent Multimedia 2020
SP - 429
EP - 434
BT - CENIM 2020 - Proceeding
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 International Conference on Computer Engineering, Network, and Intelligent Multimedia, CENIM 2020
Y2 - 17 November 2020 through 18 November 2020
ER -