On the Vision-Beam Aided Tracking for Wireless 5G-Beyond Networks Using Long Short-Term Memory with Soft Attention Mechanism

Main Article Content

Nasir Sinani
Ferkan Yilmaz


The growth of 5G technology and the continuous success of deep learning for various computer vision tasks in healthcare, self-driving cars, visual recognition, and many other areas, brought new challenges in the field of wireless communication. Moreover, 5G-Beyond networks primarily rely on how to maintain line-of-sight (LOS) links between base stations and mobile users. As such, one of the main challenges in 5G-Beyond networks is how to proactively maintain the hand-over mechanism for mobile users before blockages prevent mobile users from communicating, so as to avoid the latency of searching the best beamforming for the best performance. Accordingly, vision-aided millimeter-wave (mmWave) beam and blockage prediction has opened the door for new research for proactive hand-off and resource allocation. The purpose of this paper is to study wireless beam tracking on mmWave bands using deep learning approach evaluated on the Vision-Wireless ViWi-BT dataset [1]. We present how to predict future beam sequences from previously observed beam sequences and images using a long short-term memory (LSTM) network as a base predictive method. As such, we utilize the soft attention mechanism to intelligently choose the most important features and thus we suggest replacing the softmax attention function with different periodic attention functions to eliminate the gradient vanishing problem.


Download data is not yet available.

Article Details

How to Cite
Sinani, N., & Yilmaz, F. (2022). On the Vision-Beam Aided Tracking for Wireless 5G-Beyond Networks Using Long Short-Term Memory with Soft Attention Mechanism. The European Journal of Research and Development, 2(2), 505–520. https://doi.org/10.56038/ejrnd.v2i2.95


Alrabeiah, M., Booth, J., Hredzak, A., & Alkhateeb, A. (2020). Viwi vision-aided mmwave beam tracking: Dataset, task, and baseline solutions. arXiv preprint arXiv:2002.02445.

The official website of the vision-aided beam tracking data competition at IEEE ICC 2022: https://www.viwi-dataset.net/viwi-bt.html.

Rappaport, T. S., Xing, Y., MacCartney, G. R., Molisch, A. F., Mellios, E., & Zhang, J. (2017). Overview of millimeter wave communications for fifth-generation (5G) wireless networks—With a focus on propagation models. IEEE Transactions on antennas and propagation, 65(12), 6213-6230. DOI: https://doi.org/10.1109/TAP.2017.2734243

Niu, Y., Li, Y., Jin, D., Su, L., & Vasilakos, A. V. (2015). A survey of millimeter wave communications (mmWave) for 5G: opportunities and challenges. Wireless networks, 21(8), 2657-2676. DOI: https://doi.org/10.1007/s11276-015-0942-z

Rappaport, T. S., Sun, S., Mayzus, R., Zhao, H., Azar, Y., Wang, K., ... & Gutierrez, F. (2013). Millimeter wave mobile communications for 5G cellular: It will work!. IEEE access, 1, 335-349. DOI: https://doi.org/10.1109/ACCESS.2013.2260813

Ly, A., & Yao, Y. D. (2021). A review of deep learning in 5G research: Channel coding, massive MIMO, multiple access, resource allocation, and network security. IEEE Open Journal of the Communications Society, 2, 396-408. DOI: https://doi.org/10.1109/OJCOMS.2021.3058353

Mollel, M. S., Abubakar, A. I., Ozturk, M., Kaijage, S. F., Kisangiri, M., Hussain, S., ... & Abbasi, Q. H. (2021). A survey of machine learning applications to handover management in 5G and beyond. IEEE Access, 9, 45770-45802. DOI: https://doi.org/10.1109/ACCESS.2021.3067503

Alrabeiah, M., & Alkhateeb, A. (2020). Deep learning for mmWave beam and blockage prediction using sub-6 GHz channels. IEEE Transactions on Communications, 68(9), 5504-5518. DOI: https://doi.org/10.1109/TCOMM.2020.3003670

Xu, W., Gao, F., Jin, S., & Alkhateeb, A. (2020). 3D scene-based beam selection for mmWave communications. IEEE Wireless Communications Letters, 9(11), 1850-1854. DOI: https://doi.org/10.1109/LWC.2020.3005983

Charan, G., Alrabeiah, M., & Alkhateeb, A. (2021, June). Vision-aided dynamic blockage prediction for 6G wireless communication networks. In 2021 IEEE International Conference on Communications Workshops (ICC Workshops) (pp. 1-6). IEEE. DOI: https://doi.org/10.1109/ICCWorkshops50388.2021.9473651

Charan, G., Alrabeiah, M., & Alkhateeb, A. (2021). Vision-aided 6G wireless communications: Blockage prediction and proactive handoff. IEEE Transactions on Vehicular Technology, 70(10), 10193-10208. DOI: https://doi.org/10.1109/TVT.2021.3104219

Reus-Muns, G., Salehi, B., Roy, D., Jian, T., Wang, Z., Dy, J., ... & Chowdhury, K. (2021, December). Deep Learning on Visual and Location Data for V2I mmWave Beamforming. In 2021 17th International Conference on Mobility, Sensing and Networking (MSN) (pp. 559-566). IEEE. DOI: https://doi.org/10.1109/MSN53354.2021.00087

Roy, D., Salehi, B., Banou, S., Mohanti, S., Reus-Muns, G., Belgiovine, M., ... & Chowdhury, K. (2022). Going Beyond RF: How AI-enabled Multimodal Beamforming will Shape the NextG Standard. arXiv preprint arXiv:2203.16706.

Salehi, B., Reus-Muns, G., Roy, D., Wang, Z., Jian, T., Dy, J., ... & Chowdhury, K. (2022). Deep Learning on Multimodal Sensor Data at the Wireless Edge for Vehicular Network. arXiv preprint arXiv:2201.04712.

Tian, Y., & Wang, C. (2021, September). Vision-Aided Beam Tracking: Explore the Proper Use of Camera Images with Deep Learning. In 2021 IEEE 94th Vehicular Technology Conference (VTC2021-Fall) (pp. 01-05). IEEE. DOI: https://doi.org/10.1109/VTC2021-Fall52928.2021.9625195

Hu, Z., & Han, C. (2021, October). Image and index fused sequence-to-sequence algorithm for vision-aided millimeter-wave beam tracking. In Proceedings of the 5th ACM Workshop on Millimeter-Wave and Terahertz Networks and Sensing Systems (pp. 7-12). DOI: https://doi.org/10.1145/3477081.3481678

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778). DOI: https://doi.org/10.1109/CVPR.2016.90

Xie, S., Girshick, R., Dollár, P., Tu, Z., & He, K. (2017). Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1492-1500). DOI: https://doi.org/10.1109/CVPR.2017.634

Wang, B., Ma, L., Zhang, W., Jiang, W., Wang, J., & Liu, W. (2019). Controllable video captioning with pos sequence guidance based on gated fusion network. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 2641-2650). DOI: https://doi.org/10.1109/ICCV.2019.00273

Yao, L., Torabi, A., Cho, K., Ballas, N., Pal, C., Larochelle, H., & Courville, A. (2015). Describing videos by exploiting temporal structure. In Proceedings of the IEEE international conference on computer vision (pp. 4507-4515). DOI: https://doi.org/10.1109/ICCV.2015.512

Tian, Yu, Gaofeng Pan, and Mohamed-Slim Alouini. "Applying deep-learning-based computer vision to wireless communications: Methodologies, opportunities, and challenges." IEEE Open Journal of the Communications Society 2 (2020): 132-143. DOI: https://doi.org/10.1109/OJCOMS.2020.3042630

Venugopalan, S., Rohrbach, M., Donahue, J., Mooney, R., Darrell, T., & Saenko, K. (2015). Sequence to sequence-video to text. In Proceedings of the IEEE international conference on computer vision (pp. 4534-4542). DOI: https://doi.org/10.1109/ICCV.2015.515

Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., & Darrell, T. (2015). Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2625-2634). DOI: https://doi.org/10.1109/CVPR.2015.7298878

Feichtenhofer, C., Pinz, A., & Zisserman, A. (2016). Convolutional two-stream network fusion for video action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1933-1941). DOI: https://doi.org/10.1109/CVPR.2016.213

Hara, K., Kataoka, H., & Satoh, Y. (2017). Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet? arXiv preprint. arXiv preprint arXiv:1711.09577. DOI: https://doi.org/10.1109/CVPR.2018.00685

Ioffe, S., & Szegedy, C. (2015, June). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning (pp. 448-456). PMLR.

Agarap, A. F. (2018). Deep learning using rectified linear units (relu). arXiv preprint arXiv:1803.08375.

Wang, S., Liu, F., & Liu, B. (2021). Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism. IEEE Access, 9, 168749-168759. DOI: https://doi.org/10.1109/ACCESS.2021.3138201

Banerjee, K., Gupta, R. R., Vyas, K., & Mishra, B. (2020). Exploring alternatives to softmax function. arXiv preprint arXiv:2011.11538. DOI: https://doi.org/10.5220/0010502000002996

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780. DOI: https://doi.org/10.1162/neco.1997.9.8.1735

Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980