A Lightweight Object Detection Algorithm Based on Improved YOLOv8
Main Article Content
Keywords
lightweight object detection, LAD-YOLO, large separable kernel attention, lightweight convolution
Abstract
Lightweight object detection algorithms are crucial in the field of computer vision, directly affecting whether computer vision algorithms can be deployed on resource-constrained devices and meet the real-time requirements of daily life. To address the above problems, this paper proposes a lightweight object detection algorithm LAD-YOLO based on improved YOLOv8. First, we optimize the point-wise convolution in depthwise separable convolution to enhance the model's learning ability, introduce depthwise separable convolution into the backbone network and neck network to reduce the model size, and construct a lightweight detection head. Meanwhile, the LSKA (Large Separable Kernel Attention) mechanism is introduced to help the model capture multi-scale information and achieve better detection performance. Extensive experiments conducted on the VOC dataset show that the proposed LAD-YOLO algorithm improves the precision (P) and mAP0.5:0.95 by 2.5% and 1.8% respectively compared with YOLOv8n, while maintaining lower parameters and computational complexity.
References
- Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint, arXiv:1409.0473. https://doi.org/10.48550/arXiv.1409.0473
- Chollet, F. (2017, 21-26 July 2017). Xception: Deep learning with depthwise separable convolutions [Paper presentation]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.
- Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303-338. https://doi.org/10.1007/S11263-009-0275-4
- Girshick, R. (2015). Fast R-CNN [Paper presentation]. Proceedings of the IEEE international conference on computer vision, Santiago, Chile.
- Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., & Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint, arXiv:1704.04861. https://doi.org/10.48550/arXiv.1704.04861
- Jocher, G., Qiu, J., & Chaurasia, A. (2023). Ultralytics YOLO (version 8.0.0) [computer software]. https://github.com/ultralytics/ultralytics
- Lau, K. W., Po, L. M., & Rehman, Y. A. U. (2024). Large separable kernel attention: rethinking the large kernel attention design in CNN. Expert Systems with Applications, 236, Article 121352. https://doi.org/10.1016/J.ESWA.2023.121352
- Li, C., Li, L., Jiang, H., Weng, K., Geng, Y., Li, L., Ke, Z., Li, Q., Cheng, M., & Nie, W. (2022). YOLOv6: A single-stage object detection framework for industrial applications. arXiv preprint, arXiv:2209.02976. https://doi.org/10.48550/arXiv.2209.02976
- Lisa, M., & Bot, H. (2017). My Research software (version 2.0.4) [computer software]. https://doi.org/10.5281/zenodo.1234
- Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection [Paper presentation]. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.
- Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint, arXiv:1804.02767. https://doi.org/10.48550/arXiv.1804.02767
- Wang, A., Chen, H., Liu, L., Chen, K., Lin, Z., & Han, J. (2024). Yolov10: Real-time end-to-end object detection. arXiv preprint, arXiv:2405.14458. https://doi.org/10.48550/arXiv.2405.14458
- Wang, C.-Y., Yeh, I. H., & Liao, H.-Y. M. (2024). YOLOv9: Learning what you want to learn using programmable gradient information [Paper presentation]. Computer Vision – ECCV 2024, Milan, Italy.