Shared human actions in the video are the biggest problem for video classification system. For example, long jump sports video will share a running action with the long jump or running sports video. In this paper, we present a video classification system by combining the keyframe extractor system and convolutional neural network (CNN) classifier. The visual attention modeling was used to build the keyframe extractor system and top k frames with the highest saliency value is chosen for the classification process. By using the top k keyframe with the highest saliency value, it may reduce the shared action of the video and makes the classifier easier to classify the video by using only the spatial features. The keyframe extracted from video summarization method was used for training process, which in our system proved very efficient and speed up the training process. As a result, our system is effective and the average accuracy is increased compared with the system without using the keyframe extractor system. Our proposed method also outperforms the system using video summarization method as keyframe extractor system by around 3%.