DEVELOPMENT OF A MULTIMODAL RECOGNITION SYSTEM AND TARGET CLASSIFICATION BASED ON ARTIFICIAL INTELLIGENCE
DOI:
https://doi.org/10.30888/2663-5712.2025-34-01-113Keywords:
multimodal recognition, target classification, artificial intelligence, RGB-T data, deep learning, edge-aware learning, semantic segmentation.Abstract
The study addresses the problem of developing a multimodal system for target recognition and classification based on artificial intelligence. The relevance of the topic stems from the need to improve the reliability of automatic object detection under coReferences
Baltrušaitis, T., Ahuja, C., Morency, L.P. Multimodal Machine Learning: A Survey and Taxonomy. IEEE TPAMI, 2019. DOI: https://doi.org/10.1109/TPAMI.2018.2798607
Hazirbas, C., Ma, L., Domokos, C., Cremers, D. FuseNet: Incorporating Depth into Semantic Segmentation via Fusion. ACCV, 2016. DOI: https://doi.org/10.1007/978-3-319-54181-5_14.
Ha, Q., Watanabe, K., Karasawa, T., Ushiku, Y., Harada, T. MFNet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes. IEEE, 2017. DOI: https://doi.org/10.1109/IROS.2017.8206396.
Shivakumar, S. S.; Rodrigues, N.; Zhou, A.; Miller, I. D.; Kumar, V.; Taylor, C. J. PST900: RGB-Thermal Calibration, Dataset and Segmentation Network. arXiv, 2019. / GRASP Lab, University of Pennsylvania. arXiv preprint. URL: https://arxiv.org/abs/1909.10980.
Ji, W.; Li, J.; Bian, C.; Zhang, Z.; Cheng, L. SemanticRT: Large-Scale RGB-T Dataset for Semantic Segmentation and Target Recognition. arXiv, 2023. DOI: https://doi.org/10.1145/3581783.3611738.
Sun, Y.; Zuo, W.; Liu, M. RTFNet: RGB-T Fusion Network for Semantic Segmentation of Urban Scenes. IEEE, 2019. DOI: https://doi.org/10.1109/LRA.2019.2904733.
Chen, Z., et al. MMTM: Multimodal Transfer Module for CNN Fusion. CVPR, 2020. DOI: https://doi.org/10.48550/arXiv.1911.08670.
Li, P.; Chen, J.; Lin, B.; Xu, X. Residual Spatial Fusion Network for RGB-Thermal Semantic Segmentation. In: Proceedings / preprint, 2023. URL: https://huggingface.co/papers/trending.
Li, G.; … CAFNet: Cross-Modal Adaptive Fusion Network With Attention and Gated Weighting for RGB-T Semantic Segmentation. IEEE Access, (DOAJ). DOI: https://doi.org/10.1109/access.2025.3595811.
Zhou, Z.; Wu, S.; Zhu, G.; Wang, H.; He, Z. Channel and Spatial Relation-Propagation Network for RGB-Thermal Semantic Segmentation (CSRPNet). arXiv, 2023. DOI: https://doi.org/10.48550/arXiv.2308.12534.
He, K., Girshick, R., Dollár, P. Feature Pyramid Networks for Object Detection. CVPR, 2017. DOI: https://doi.org/10.1109/CVPR.2017.106.
Valada, A., Vertens, J., Dhall, A., Burgard, W. Self-Supervised Multimodal Fusion for Semantic Segmentation. IEEE RAL, 2019. DOI: https://doi.org/10.1007/s11263-019-01188-y.
Zhang, J.; Liu, H.; Yang, K.; Hu, X.; Liu, R.; Stiefelhagen, R. CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers. arXiv, 2022. DOI: https://doi.org/10.48550/arXiv.2203.04838.
Wang, M.; Zhu, Z.; Wang, Y.; Tu, R.; Weng, J.; Yu, X. Edge-Supervised Attention-Aware Fusion Network for RGB-T Semantic Segmentation. Electronics, 2025. DOI: https://doi.org/10.3390/electronics14081489.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Authors

This work is licensed under a Creative Commons Attribution 4.0 International License.


