TY - GEN
T1 - Probabilistic Record Matching for Entity Resolution Using Markov Logic Networks
AU - Lukluk, Muhammad
AU - Affandi, Achmad
AU - Hariadi, Mochamad
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/7/2
Y1 - 2018/7/2
N2 - Entity resolution (ER) is a problem in identifying objects referring to the same real-world entity into a single representation. In the context of the database, ER is also known as record linkage to determine records that refer to the same entities in which statistical probabilistic approach of this type of ER is called probabilistic record linkage (PRL). In addition, PRL has been used for various ER problems, including derivatives that use machine learning as an improvement. However, this probabilistic approach has one problem in ER for dealing with missing data that commonly occur in unreliable datasets. Such unreliable data can lead to more uncertainty and can reduce the quality of the final result. This paper discusses an alternative approach of PRL using a Markov logic networks (MLN) to infer the matching of record pairs in unreliable datasets, especially for datasets with a high rate of missing data. The proposed approach was inspired by a model of matching dependencies (MDS) that has been formally introduced to address unreliable datasets. Experimentation on real-world datasets taken from State Islamic University of Maulana Malik Ibrahim Malang Indonesia was done with 0.977 accuracy approaching 0.986 in the previous method.
AB - Entity resolution (ER) is a problem in identifying objects referring to the same real-world entity into a single representation. In the context of the database, ER is also known as record linkage to determine records that refer to the same entities in which statistical probabilistic approach of this type of ER is called probabilistic record linkage (PRL). In addition, PRL has been used for various ER problems, including derivatives that use machine learning as an improvement. However, this probabilistic approach has one problem in ER for dealing with missing data that commonly occur in unreliable datasets. Such unreliable data can lead to more uncertainty and can reduce the quality of the final result. This paper discusses an alternative approach of PRL using a Markov logic networks (MLN) to infer the matching of record pairs in unreliable datasets, especially for datasets with a high rate of missing data. The proposed approach was inspired by a model of matching dependencies (MDS) that has been formally introduced to address unreliable datasets. Experimentation on real-world datasets taken from State Islamic University of Maulana Malik Ibrahim Malang Indonesia was done with 0.977 accuracy approaching 0.986 in the previous method.
KW - data integration
KW - entity resolution
KW - markov logic networks
KW - matching dependencies
KW - probabilistic record linkage
UR - http://www.scopus.com/inward/record.url?scp=85065061000&partnerID=8YFLogxK
U2 - 10.1109/EECCIS.2018.8692979
DO - 10.1109/EECCIS.2018.8692979
M3 - Conference contribution
AN - SCOPUS:85065061000
T3 - 2018 Electrical Power, Electronics, Communications, Controls and Informatics Seminar, EECCIS 2018
SP - 360
EP - 364
BT - 2018 Electrical Power, Electronics, Communications, Controls and Informatics Seminar, EECCIS 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 Electrical Power, Electronics, Communications, Controls and Informatics Seminar, EECCIS 2018
Y2 - 9 October 2018 through 11 October 2018
ER -