The Building an On-Premises Knowledge Repository with Large Language Models for Instant Information Access
Burak Dobur
Procat
https://orcid.org/0009-0001-6060-9841
Engin Bıçakcı
Procat
https://orcid.org/0009-0001-9398-4795
Asli Terim
Procat
https://orcid.org/0009-0003-1082-5866
Cemal Arık
Procat
https://orcid.org/0009-0002-4700-4597
DOI: https://doi.org/10.56038/oprd.v5i1.545
Keywords: Knowledge Library, Large Language Models, AI, Real-time Information Retrieval, Decision-making
Abstract
This project aims to design and develop a live knowledge library utilizing large language models (LLMs) to enhance access to real-time information across various domains. The system will be deployed on-premises, enabling instant responses to user queries, thus optimizing information retrieval processes. By leveraging the natural language processing (NLP) capabilities of LLMs, the project seeks to improve decision-making and operational efficiency within organizations. It addresses the growing need for rapid information access, providing precise and accurate answers to user inquiries, minimizing the delays inherent in traditional search methods. Additionally, the system enhances user experience by offering a user-friendly interface with quick response times, making information retrieval more intuitive. The project also focuses on improving internal knowledge flow by facilitating better communication and collaboration across departments. With an emphasis on scalability, the solution is designed to be adaptable to various sectors, ensuring widespread applicability. By continuously learning and adapting to new data, the system will provide up-to-date information, reducing reliance on manual updates and minimizing human error. Ultimately, this innovation aims to significantly enhance productivity, support effective decision-making, and offer a competitive advantage to organizations through the use of AI-driven knowledge management solutions.
Keywords: Knowledge Library, Large Language Models, AI, Real-time Information Retrieval, Decision-making
References
Zhong, L., Wu, J., Li, Q., Peng, H., & Wu, X. (2023). A comprehensive survey on automatic knowledge graph construction. ACM Computing Surveys, 56(4), 1-62. DOI: https://doi.org/10.1145/3618295
Thanachawengsakul, N., Wannapiroon, P., & Nilsook, P. (2019). The Knowledge Repository Management System Architecture of Digital Knowledge Engineering using Machine Learning to Promote Software Engineering Competencies. International Journal of Emerging Technologies in Learning, 14(12). DOI: https://doi.org/10.3991/ijet.v14i12.10444
Wang, H., Xu, Z., Fujita, H., & Liu, S. (2016). Towards felicitous decision making: An overview on challenges and trends of Big Data. Information Sciences, 367, 747-765. DOI: https://doi.org/10.1016/j.ins.2016.07.007
Walker, W. H., & Kintsch, W. (1985). Automatic and strategic aspects of knowledge retrieval. Cognitive Science, 9(2), 261-283. DOI: https://doi.org/10.1016/S0364-0213(85)80016-1
Martin, P., & Eklund, P. W. (2000). Knowledge retrieval and the world wide web. Ieee Intelligent Systems and Their Applications, 15(3), 18-25. DOI: https://doi.org/10.1109/5254.846281
Oskooei, A. R., Babacan, M. S., Yağcı, E., Alptekin, Ç., & Buğday, A. (2024). Beyond synthetic benchmarks: Assessing recent LLMs for code generation. The 14th International Workshop on Computer Science and Engineering (WCSE 2024), 290-296. Phuket Island, Thailand.
Long, X., Zeng, J., Meng, F., Ma, Z., Zhang, K., Zhou, B., & Zhou, J. (2024, March). Generative multi-modal knowledge retrieval with large language models. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 38, No. 17, pp. 18733-18741). DOI: https://doi.org/10.1609/aaai.v38i17.29837
Zhu, Y., Yuan, H., Wang, S., Liu, J., Liu, W., Deng, C., ... & Wen, J. R. (2023). Large language models for information retrieval: A survey. arXiv preprint arXiv:2308.07107.
Abdalla, H. B., Ahmed, A. M., & Al Sibahee, M. A. (2020). Optimization driven MapReduce framework for indexing and retrieval of big data. KSII Transactions on Internet and Information Systems (TIIS), 14(5), 1886-1908. DOI: https://doi.org/10.3837/tiis.2020.05.002
Oskooei, A. R. (2024). On the use of data parallelism technologies for implementing statistical analysis functions. The 14th International Workshop on Computer Science and Engineering (WCSE 2024), 94-102. Phuket Island, Thailand.
Zhang, Y., Cao, T., Li, S., Tian, X., Yuan, L., Jia, H., & Vasilakos, A. V. (2016). Parallel processing systems for big data: a survey. Proceedings of the IEEE, 104(11), 2114-2136. DOI: https://doi.org/10.1109/JPROC.2016.2591592
Rafieioskouei, A., Rogale, K., Dibavar, A. S., Mahmoudi, M., & Bonakdarpour, B. (2024). Causality analysis of protein corona composition: phosphatidylcholine-enhances plasma proteome profiling by proteomics. bioRxiv, 2024-09. DOI: https://doi.org/10.1101/2024.09.10.612356
Marwala, T. (2015). Causality, correlation and artificial intelligence for rational decision making. World Scientific. DOI: https://doi.org/10.1142/9356
Rafieioskouei, A., & Bonakdarpour, B. (2024). Efficient Discovery of Actual Causality Using Abstraction Refinement. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 43(11), 4274-4285. DOI: https://doi.org/10.1109/TCAD.2024.3448299
Raharjana, I. K., Siahaan, D., & Fatichah, C. (2021). User stories and natural language processing: A systematic literature review. IEEE access, 9, 53811-53826. DOI: https://doi.org/10.1109/ACCESS.2021.3070606
Planas, E., Daniel, G., Brambilla, M., & Cabot, J. (2021). Towards a model-driven approach for multiexperience AI-based user interfaces. Software and Systems Modeling, 20(4), 997-1009. DOI: https://doi.org/10.1007/s10270-021-00904-y
de Souza Alves, T., de Oliveira, C. S., Sanin, C., & Szczerbicki, E. (2018). From knowledge based vision systems to cognitive vision systems: a review. Procedia Computer Science, 126, 1855-1864. DOI: https://doi.org/10.1016/j.procs.2018.08.077
Ruíz, L. M., Pueyo, P. P., Mateo-Fornés, J., Mayoral, J. V., & Tehàs, F. S. (2022). Autoscaling pods on an on-premise Kubernetes infrastructure QoS-aware. IEEE Access, 10, 33083-33094. DOI: https://doi.org/10.1109/ACCESS.2022.3158743
Zhong, Z., Xu, M., Rodriguez, M. A., Xu, C., & Buyya, R. (2022). Machine learning-based orchestration of containers: A taxonomy and future directions. ACM Computing Surveys (CSUR), 54(10s), 1-35. DOI: https://doi.org/10.1145/3510415
Rafiei Oskooei, A., Yahsi, E., Sungur, M., & S. Aktas, M. (2024, July). Can One Model Fit All? An Exploration of Wav2Lip’s Lip-Syncing Generalizability Across Culturally Distinct Languages. In International Conference on Computational Science and Its Applications (pp. 149-164). Cham: Springer Nature Switzerland. DOI: https://doi.org/10.1007/978-3-031-65282-0_10
Rahman, M. M., Balakrishnan, D., Murthy, D., Kutlu, M., & Lease, M. (2021). An information retrieval approach to building datasets for hate speech detection. arXiv preprint arXiv:2106.09775.
Guveyi, E., Aktas, M. S., & Kalipsiz, O. (2020). Human factor on software quality: A systematic literature review. In O. Gervasi, B. Murgante, S. Misra, C. Garau, I. Blečić, D. Taniar, B. O. Apduhan, A. M. A. C. Rocha, E. Tarantino, C. M. Torre, & Y. Karaca (Eds.), Computational Science and Its Applications – ICCSA 2020. Lecture Notes in Computer Science (pp. 918–930). Springer. DOI: https://doi.org/10.1007/978-3-030-58811-3_65
Aktas, M. S., & Kapdan, M. (2016). Structural code clone detection methodology using software metrics. International Journal of Software Engineering and Knowledge Engineering, 26(2), 307–332. DOI: https://doi.org/10.1142/S0218194016500133
Oz, M., Kaya, C., Olmezogullari, E., & Aktas, M. S. (2021). On the use of generative deep learning approaches for generating hidden test scripts. International Journal of Software Engineering and Knowledge Engineering, 31(10), 1447–1468. DOI: https://doi.org/10.1142/S0218194021500480
Oguz, R.F., Oz, M., Olmezogullari, E., Aktas, M. S. (2022). Extracting Information from Large Scale Graph Data: Case Study on Automated UI Testing, Euro-Par 2021: Parallel Processing Workshops, LNCS,volume 13098. DOI: https://doi.org/10.1007/978-3-031-06156-1_29
Uzun-Per, M., Can, A. B., Gurel, A. V., & Aktas, M. S. (2021). Big data testing framework for recommendation systems in e-science and e-commerce domains. 2021 IEEE International Conference on Big Data (Big Data), 2021. DOI: https://doi.org/10.1109/BigData52589.2021.9672082
Erdem, I., Oguz, R. F., Olmezogullari, E., & Aktas, M. S. (2021). Test script generation based on hidden Markov models learning from user browsing behaviors 2021 IEEE International Conference on Big Data (Big Data), 2021. DOI: https://doi.org/10.1109/BigData52589.2021.9671312
Düzen, Z., & Aktas, M. S. (2016). An approach to hybrid personalized recommender systems. 2016 International Symposium on INnovations in Intelligent SysTems and Applications (INISTA), 2-5 Ağustos 2016, Sinaia, Romanya. DOI: https://doi.org/10.1109/INISTA.2016.7571865
Uzun-Per, M., Gurel, A. V., Can, A. B., & Aktas, M. S. (2022). Scalable recommendation systems based on finding similar items and sequences. Concurrency and Computation: Practice and Experience, 34(20). DOI: https://doi.org/10.1002/cpe.6841
Yildiz, B. (2022, September). Enhancing image resolution with generative adversarial networks. In 2022 7th International Conference on Computer Science and Engineering (UBMK) (pp. 104–109). IEEE. DOI: https://doi.org/10.1109/UBMK55850.2022.9919520
Yıldız, B. (2022). Efficient text classification with deep learning on imbalanced data improved with better distribution. Turkish Journal of Science and Technology, 17(1), 89–98. DOI: https://doi.org/10.55525/tjst.1068940
Briman, M. K. H., & Yildiz, B. (2024). Beyond ROUGE: A comprehensive evaluation metric for abstractive summarization leveraging similarity, entailment, and acceptability. International Journal on Artificial Intelligence Tools. DOI: https://doi.org/10.1142/S0218213024500179
Saad, A. M. S. E., & Yildiz, B. (2022, September). Reinforcement learning for intrusion detection. In International Conference on Computing, Intelligence and Data Analytics (pp. 230–243). Cham: Springer International Publishing. DOI: https://doi.org/10.1007/978-3-031-27099-4_18
Haider, U., & Yildiz, B. (2023, December). A novel use of reinforcement learning for elevated click-through rate in online advertising. In 2023 International Conference on Computational Science and Computational Intelligence (CSCI) (pp. 64–70). IEEE. DOI: https://doi.org/10.1109/CSCI62032.2023.00017
Yildiz, B. (2021). Optimizing bitmap index encoding for high performance queries. Concurrency and Computation: Practice and Experience, 33(18), e5943. DOI: https://doi.org/10.1002/cpe.5943
Yildiz, B., & Tezgider, M. (2020). Learning quality improved word embedding with assessment of hyperparameters. In Euro-Par 2019: Parallel Processing Workshops: Euro-Par 2019 International Workshops, Göttingen, Germany, August 26–30, 2019, Revised Selected Papers 25 (pp. 506–518). Cham: Springer International Publishing. DOI: https://doi.org/10.1007/978-3-030-48340-1_39