Visual Discovery in Retail: Operationalizing AI-Powered Visual Search at Boyner

Mert Alacan

Boyner

https://orcid.org/0000-0003-3893-6309

Seza Dursun

Boyner

https://orcid.org/0000-0003-1389-072X

Bahar Önel

Boyner

https://orcid.org/0009-0007-4597-6591

Tülin Işıkkent

Boyner

https://orcid.org/0009-0005-5775-0093

Sedat Çelik

Boyner

https://orcid.org/0009-0003-0335-6440

DOI: https://doi.org/10.56038/oprd.v7i1.742

Keywords: Visual Search, Multimodal AI, GroundingDINO, SigLIP, Milvus, Retail Intelligence, Semantic Search, AI in E-Commerce, Omnichannel Retail, Customer Experience


Abstract

In today's retail landscape, where millions of products and visual stimuli compete for customer attention, the integration of artificial intelligence into visual search has emerged as a crucial lever of operational efficiency. This paper presents Boyner Group's AI-powered visual discovery system, which enables customers to search using photos instead of keywords, making product discovery more intuitive and visually engaging. The architecture leverages a hybrid approach combining Large Language Models (LLMs), vision models such as GroundingDINO, and vector-based semantic similarity engines like SigLIP+Milvus to deliver scalable and high-accuracy image retrieval. The system, currently operational across the Boyner.com.tr ecosystem, supports enhanced filtering and storytelling capabilities, increasing customer satisfaction and conversion rates. The implementation process, system components, and operational results of this large-scale AI integration are explored, highlighting its transformative impact within omnichannel retail.

Keywords:   Visual Search, Multimodal AI, GroundingDINO, SigLIP, Milvus, Retail Intelligence, Semantic Search, AI in E-Commerce, Omnichannel Retail, Customer Experience


References

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84–90. DOI: https://doi.org/10.1145/3065386

Kannan, P. K., & Li, H. (2017). Digital marketing: A framework, review and research agenda. International Journal of Research in Marketing, 34(1), 22–45. DOI: https://doi.org/10.1016/j.ijresmar.2016.11.006

Gu, J., Wang, Z., Kuen, J., et al. (2018). Recent advances in convolutional neural networks. Pattern Recognition, 77, 354–377. DOI: https://doi.org/10.1016/j.patcog.2017.10.013

Radford, A., Kim, J. W., Hallacy, C., et al. (2021). Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning (ICML).

Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30.

Liu, S., Qi, L., Qin, H., et al. (2023). Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection. arXiv:2303.05499. DOI: https://doi.org/10.1007/978-3-031-72970-6_3

Wang, J., Zhu, Y., & Wang, Y. (2022). Milvus: A Purpose-Built Vector Database to Power Embedding-Based Applications. In Proceedings of the VLDB Endowment, 15(12), 3596–3603.

Zhang, J., Zhang, Z., & Wang, Y. (2021). FashionBERT: Text and Image Matching with Adaptive Loss for Cross-Modal Retrieval. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval.

Most read articles by the same author(s)