Graph-Based Customer Segmentation with GraphSAGE on a Customer–Vehicle Bipartite Network

Abdullah Sezdi

Arabam.com

https://orcid.org/0009-0000-5639-7535

Metin Bilgin

Bursa Uludağ University

https://orcid.org/0000-0002-4216-0542

DOI: https://doi.org/10.56038/oprd.v7i1.670

Keywords: Bipartite graph, GraphSAGE, Link prediction, Customer Segmentation, K-means, automotive analytics, Graph Neural, graph neural networks


Abstract

This study models customer–vehicle interactions in an online used-car platform as a bipartite structure, constructing a graph with customer (U) and vehicle (V) nodes. Relations between the two node sets are defined only by edges representing realized purchase events (e=(u,v,t)), thereby focusing on a signal with high business value and relatively low noise. On this graph, inductive node representations (embeddings) are learned with GraphSAGE. During training, link prediction is used solely as a self-supervised proxy task; optimization employs an MLP-based scorer with Binary Cross-Entropy (BCE) loss. Early stopping is triggered when the BCE on a temporally held-out validation set stops improving; together with temporal negative sampling, this prevents leakage of future information.

The objective is to obtain high-quality customer/vehicle embeddings. The learned representations are then used to construct embedding-based customer segments via K-Means. Segmentation quality is evaluated using the Silhouette and Calinski–Harabasz scores. The results show that GraphSAGE embeddings learned on the purchase-induced bipartite graph provide a practical and scalable foundation for recommendation/targeting and customer understanding tasks


References

S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme, “Bpr: Bayesian personalized ranking from implicit feedback,” in UAI, 2009.

A. Grover and J. Leskovec, “node2vec: Scalable feature learning for networks,” in KDD, 2016. DOI: https://doi.org/10.1145/2939672.2939754

W. L. Hamilton, R. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” in NeurIPS, 2017.

R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, and J. Leskovec, “Graph convolutional neural networks for web-scale recommender systems,” in KDD, 2018. DOI: https://doi.org/10.1145/3219819.3219890

X. Wang, X. He, M. Wang, F. Feng, and T.-S. Chua, “Neural graph collaborative filtering,” in SIGIR, 2019. DOI: https://doi.org/10.1145/3331184.3331267

X. He, K. Deng, X. Wang, Y. Li, Y. Zhang, and M. Wang, “Lightgcn: Simplifying and powering graph convolution network for recommendation,” in SIGIR, 2020. DOI: https://doi.org/10.1145/3397271.3401063

Z. Wu et al., “A comprehensive survey on graph contrastive learning,” IEEE TPAMI, 2021, arXiv:2106.05264.

Y. You, T. Chen, Y. Sui, T. Chen, Z. Wang, and Y. Shen, “Graph contrastive learning with augmentations,” in NeurIPS, 2020.

P. J. Rousseeuw, “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis,” Journal of Computational and Applied Mathematics, vol. 20, pp. 53–65, 1987. DOI: https://doi.org/10.1016/0377-0427(87)90125-7

T. Calinski and J. Harabasz, “A dendrite method for cluster analysis,” Communications in Statistics, vol. 3, no. 1, pp. 1–27 DOI: https://doi.org/10.1080/03610917408548446