Abstract
Process model resulting from small event log datasets can be easily done because currently available applications are relatively able to do so. However, when faced with event logs from big data, modelling will force the existing applications to work hard. So far, the steps used are sampling of event logs resulting from the system. The problem arising is that the sampling process must be done several times because it has to check the desired fitness value on the sample taken. If the fitness value has not been got, then the sample size is added and the fitness value at a certain iteration is calculated until the required fitness value is met. There are many steps to do with this mechanism. Thus, this paper proposes an alternative way to reduce the steps by using an appropriate sampling technique. The mechanism used is statistical-based sampling simulation in the event log datasets to determine which sampling method is stable in the process modelling. The simulation results show that the sampling method using Cluster Random Sampling with the error rate or Alpha of 1 % has a relatively stable process model and can represent the process model resulting from the event log population.
Original language | English |
---|---|
Pages (from-to) | 17-28 |
Number of pages | 12 |
Journal | International Journal of Simulation Modelling |
Volume | 22 |
Issue number | 1 |
DOIs | |
Publication status | Published - Mar 2023 |
Keywords
- Big Data
- Event Log
- Process Discovery
- Process Mining
- Sampling