Comparative Analysis of Deep fake Video and Audio Detection
DOI:
https://doi.org/10.70454/JRIST.2025.10102Keywords:
Deep fake Detection, ResNet50, Efficient Net B0, MFCC, Random Forest ClassifierAbstract
Deepfake ("deep learning + fake" = DF) refers to the forged videos and audios generated using AI algorithms. While they can be a source of entertainment,theycanalsobeharmfulinvariousways.Manipulatingbothau- diosandvideosforharmfulpurposeshasbeen a concerning issue from the past more than 10 years. The ability todetect these videos and audios through AI detectors is a motivatingfactor in achieving the best results for the project. Thispapercontainscomparativestudyoftheexistingresearchondeepfakeissues showcasingAccuracy,F1Score,Barplotsandgraphsofthesame.Whileexplo- rationofdeepfakevideoshaveseveralapproachesanddatasetsavailable,theau- diodeepfakes has been relatively neglected. In this work, we propose theidea ofjointdeepfakevideoandaudiodetectionusingahybrid deep learning model ensembling ResNet50 and EfficientNet B0. The dataset comprising real and synthetic voice recordings was selected from the SceneFake repository on Kaggle. Key audio features, including Mel- frequency cepstral coefficients (MFCCs), spectral centroid, chroma, zero- crossing rate, and root mean square energy(RMSE),wereextractedusingthelibrosalibrary. RandomForestClassi- fierwastrainedtodetectaudiowhileDFDVdatasetwasutilizedtoextractfacial frames from videos.
References
[1] Detecting Deepfakes: Can You Trust What You See? YouTube, 2024. [Online]. Available: https://www.youtube.com/watch?v=wJYY0ngBwT0. Accessed: Nov. 25, 2024.
[2] A. Jadhav, A. Patange, J. Patel, H. Patil, and M. Mahajan, "Deepfake Video Detection Using Neural Networks," IJSRD - International Journal for Scientific Research and Development, vol. 8, no. 1, pp. 1016–1022, 2020.
[3] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning Internal Representations by Error Propagation. Technical Report, California Univ San Diego La Jolla Inst for Cognitive Science, 1985.
[4] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative adversarial networks," arXiv preprint arXiv:1406.2661, 2014.
[5] Y. Jia, Y. Zhang, R. J. Weiss, Q. Wang, J. Shen, F. Ren, Z. Chen, P. Nguyen, R. Pang, I. L. Moreno, and Y. Wu, "Transfer learning from speaker verification to multispeaker text-to-speech synthesis," arXiv preprint arXiv:1806.04558, 2019.
[6] A. Smith, "Deepfakes are the most dangerous crime of the future, researchers," The Independent, 2020. [Online]. Available: https://www.independent.co.uk/life-style/gadgets-and-tech/news/deepfakes-dangerous-crime-artificial-intelligence-a9655821.html. Accessed: May 31, 2021.
[7] A. Rössler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner, "FaceForensics++: Learning to detect manipulated facial images," in Proceedings of the International Conference on Computer Vision (ICCV), 2019.
[8] S.-Y. Wang, O. Wang, R. Zhang, A. Owens, and A. A. Efros, "CNN-generated images are surprisingly easy to spot...for now," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[9] N. Yu, T. S. Davis, and M. Fritz, "Attributing fake images to GANs: Learning and analyzing GAN fingerprints," in Proceedings of the International Conference on Computer Vision (ICCV), 2019.
[10] L. Chai, D. Bau, S.-N. Lim, and P. Isola, "What makes fake images detectable? Understanding properties that generalize," in Proceedings of the European Conference on Computer Vision (ECCV), 2020.
[11] L. Li, J. Bao, T. Zhang, H. Yang, D. Chen, F. Wen, and B. Guo, "Face X-ray for more general face forgery detection," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[12] Y. Li and S. Lyu, "Exposing deepfake videos by detecting face warping artifacts," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2019.
[13] S. Agarwal, H. Farid, Y. Gu, M. He, K. Nagano, and H. Li, "Protecting world leaders against deep fakes," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2019.
[14] S. Agarwal, H. Farid, T. El-Gaaly, and S.-N. Lim, "Protecting world leaders against deep fakes," in Proceedings of the IEEE International Workshop on Information Forensics and Security (WIFS), 2020.
[15] H. B. Zhuo, Y. L. Zhang, and N. Wang, "Generative Adversarial Networks in Biometric Presentation Attack Detection: A Comprehensive Survey," IEEE Access, vol. 8, pp. 19536–19555, Feb. 2020.
[16] Z. Cai, S. Ghosh, A. P. Adatia, M. Hayat, A. Dhall, T. Gedeon, and K. Stefanov, "AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset," in Proceedings of the 32nd ACM International Conference on Multimedia, 2024.
[17] "SceneFake," Kaggle, Apr. 20, 2024. Available: https://www.kaggle.com/datasets/mohammedabdeldayem/scenefake.
[18] "Deepfake Detection Challenge," Kaggle. Available: https://www.kaggle.com/c/deepfake-detection-challenge/overview.
[19] A. Hamza, A. R. Javed, F. Iqbal, N. Kryvinska, A. S. Almadhor, Z. Jalil, and R. Borghol, "Deepfake Audio Detection via MFCC Features Using Machine Learning," IEEE Access, vol. 10, Dec. 2022.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Harish Chandra Prasad, Dr. Arti Gautam Dinker (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.
This is an Open Access article distributed under the term's of the Creative Common Attribution 4.0 International License permitting all use, distribution, and reproduction in any medium, provided the work is properly cited.