An Empirical Comparison of Claude, Llama, and Gemini for Aspect-Level Sentiment

Main Article Content

Pınar
Mustafa

Abstract

Aspect-based sentiment analysis provides granular insights into customer feedback by identifying discrete aspects, such as features or topics, and assigning a corresponding sentiment to each. This study assesses three large language models, hereafter referred to as LLMs, namely Google Gemini 2.5 Flash-Lite, Anthropic Claude Sonnet-4 delivered through AWS Bedrock, and Meta LLaMA 3.3 70B delivered through AWS Bedrock, using a real-world multilingual corpus of 7,841 Turkish mobile banking app reviews from İşbank in Turkey. We employ a prompt-based tagging protocol to extract aspect–sentiment pairs from every review, and we compare accuracy, F1-score, inference cost, and latency. The results show that all three LLMs can execute multilingual aspect extraction and sentiment categorization without task-specific fine-tuning. Claude Sonnet-4 attains the highest F1 for aspect extraction and the highest sentiment accuracy, although it incurs a markedly higher inference cost. Gemini 2.5 Flash-Lite achieves competitive accuracy at a fraction of the price, making it well-suited for high-volume analytics. Meta LLaMA at the 70B scale accessed through AWS Bedrock exhibits intermediate performance with moderate cost and latency. We provide detailed performance tables and figures, along with best-practice guidance for enterprise deployment. AWS Bedrock enables the strategic selection of Claude and LLaMA 3.3 70B for multilingual sentiment analysis, offering valuable insights from app reviews within scale, accuracy, and budget constraints.

Downloads

Download data is not yet available.

Article Details

How to Cite
Ersoy, P., & Erşahin, M. (2025). An Empirical Comparison of Claude, Llama, and Gemini for Aspect-Level Sentiment. The European Journal of Research and Development, 5(1), 149–163. https://doi.org/10.56038/ejrnd.v5i1.659
Section
Articles
Author Biography

Pınar, Commencis Teknoloji

Pınar Ersoy (Senior Member, IEEE) was born in Istanbul, Türkiye. She received the B.Eng. degrees in software engineering and industrial engineering from Bahçeşehir University, Istanbul, in 2014 and 2015, respectively, and the M.Sc. degree in software engineering from Boğaziçi University, Istanbul, in 2018. From 2014 to 2016, she was an Associate Analytics Consultant at SAS Institute, focusing on NLP-driven proofs of concept for banking and telecommunications clients. From 2016 to 2017, she was a Risk Analytics Associate with Garanti BBVA, where she developed exposure-at-default models for retail lending. From 2017 to 2018, she was a Data Scientist with Trendyol Group, where she developed deep learning-based image-analysis models for e-commerce personalization. From 2018 to 2019, she was an Analytics Consultant with Mastercard Advisors, leading studies on credit-card sales-channel optimization. She then joined Dataroid, Istanbul, where she was a Senior Lead Data Scientist from 2019 to 2025, directing teams that built churn-prediction systems and integrating offline LLaMA-based LLMs into code-review workflows. Since 2025, she has been a Senior Lead Research and Development Specialist at Commencis Teknoloji in Istanbul, spearheading generative AI research, EU Horizon 2020 project proposals, and enterprise AI toolchains. She has authored more than 30 academic and technical articles on machine learning and deep learning, holds the AWS Certified AI Practitioner credential, and was elevated to BCS Fellow in 2025. Her research interests include retrieval-augmented generation and LLM optimization.

 

References

M. Pontiki, D. Galanis, J. Pavlopoulos, H. Papageorgiou, I. Androutsopoulos, and S. Manandhar, “SemEval-2014 Task 4: Aspect Based Sentiment Analysis,” Proceedings of the 8th International Workshop on Semantic Evaluation, pp. 27–35, 2014. DOI: https://doi.org/10.3115/v1/S14-2004

D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet Allocation,” Journal of Machine Learning Research, vol. 3, pp. 993–1022, 2003.

D. D. Lee and H. S. Seung, “Learning the Parts of Objects by Non-Negative Matrix Factorization,” Nature, vol. 401, pp. 788–791, 1999. DOI: https://doi.org/10.1038/44565

C. Wu, B. Ma, Z. Zhang, N. Deng, Y. He, and Y. Xue, “Evaluating Zero-Shot Multilingual Aspect-Based Sentiment Analysis with Large Language Models,” arXiv preprint arXiv:2412.12564, 2024. DOI: https://doi.org/10.1007/s13042-025-02711-z

P. F. Simmering, R. Werkmeister, and L. Di Stasio, “Large Language Models for Aspect-Based Sentiment Analysis,” arXiv preprint arXiv:2310.18025, 2023.

M. Água, P. Pina, and B. Ribeiro, “Large Language Models Powered Aspect-Based Sentiment Analysis for Enhanced Customer Insights,” Tourism and Management Studies, vol. 21, 2025. DOI: https://doi.org/10.18089/tms.20250101

J. Šmíd, M. Bělohlávek, and T. Brychcín, “LLaMA-Based Models for Aspect-Based Sentiment Analysis,” Proceedings of the 14th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 66–78, 2024. DOI: https://doi.org/10.18653/v1/2024.wassa-1.6