An Empirical Comparison of Claude, Llama, and Gemini for Aspect-Level Sentiment
Main Article Content
Abstract
Aspect-based sentiment analysis provides granular insights into customer feedback by identifying discrete aspects, such as features or topics, and assigning a corresponding sentiment to each. This study assesses three large language models, hereafter referred to as LLMs, namely Google Gemini 2.5 Flash-Lite, Anthropic Claude Sonnet-4 delivered through AWS Bedrock, and Meta LLaMA 3.3 70B delivered through AWS Bedrock, using a real-world multilingual corpus of 7,841 Turkish mobile banking app reviews from İşbank in Turkey. We employ a prompt-based tagging protocol to extract aspect–sentiment pairs from every review, and we compare accuracy, F1-score, inference cost, and latency. The results show that all three LLMs can execute multilingual aspect extraction and sentiment categorization without task-specific fine-tuning. Claude Sonnet-4 attains the highest F1 for aspect extraction and the highest sentiment accuracy, although it incurs a markedly higher inference cost. Gemini 2.5 Flash-Lite achieves competitive accuracy at a fraction of the price, making it well-suited for high-volume analytics. Meta LLaMA at the 70B scale accessed through AWS Bedrock exhibits intermediate performance with moderate cost and latency. We provide detailed performance tables and figures, along with best-practice guidance for enterprise deployment. AWS Bedrock enables the strategic selection of Claude and LLaMA 3.3 70B for multilingual sentiment analysis, offering valuable insights from app reviews within scale, accuracy, and budget constraints.
Downloads
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
References
M. Pontiki, D. Galanis, J. Pavlopoulos, H. Papageorgiou, I. Androutsopoulos, and S. Manandhar, “SemEval-2014 Task 4: Aspect Based Sentiment Analysis,” Proceedings of the 8th International Workshop on Semantic Evaluation, pp. 27–35, 2014. DOI: https://doi.org/10.3115/v1/S14-2004
D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet Allocation,” Journal of Machine Learning Research, vol. 3, pp. 993–1022, 2003.
D. D. Lee and H. S. Seung, “Learning the Parts of Objects by Non-Negative Matrix Factorization,” Nature, vol. 401, pp. 788–791, 1999. DOI: https://doi.org/10.1038/44565
C. Wu, B. Ma, Z. Zhang, N. Deng, Y. He, and Y. Xue, “Evaluating Zero-Shot Multilingual Aspect-Based Sentiment Analysis with Large Language Models,” arXiv preprint arXiv:2412.12564, 2024. DOI: https://doi.org/10.1007/s13042-025-02711-z
P. F. Simmering, R. Werkmeister, and L. Di Stasio, “Large Language Models for Aspect-Based Sentiment Analysis,” arXiv preprint arXiv:2310.18025, 2023.
M. Água, P. Pina, and B. Ribeiro, “Large Language Models Powered Aspect-Based Sentiment Analysis for Enhanced Customer Insights,” Tourism and Management Studies, vol. 21, 2025. DOI: https://doi.org/10.18089/tms.20250101
J. Šmíd, M. Bělohlávek, and T. Brychcín, “LLaMA-Based Models for Aspect-Based Sentiment Analysis,” Proceedings of the 14th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 66–78, 2024. DOI: https://doi.org/10.18653/v1/2024.wassa-1.6