The Building an On-Premises Knowledge Repository with Large Language Models for Instant Information Access

Burak Dobur

Procat

https://orcid.org/0009-0001-6060-9841

Engin Bıçakcı

Procat

https://orcid.org/0009-0001-9398-4795

Asli Terim

Procat

https://orcid.org/0009-0003-1082-5866

Cemal Arık

Procat

https://orcid.org/0009-0002-4700-4597

DOI: https://doi.org/10.56038/oprd.v5i1.545

Keywords: Knowledge Library, Large Language Models, AI, Real-time Information Retrieval, Decision-making


Abstract

This project aims to design and develop a live knowledge library utilizing large language models (LLMs) to enhance access to real-time information across various domains. The system will be deployed on-premises, enabling instant responses to user queries, thus optimizing information retrieval processes. By leveraging the natural language processing (NLP) capabilities of LLMs, the project seeks to improve decision-making and operational efficiency within organizations. It addresses the growing need for rapid information access, providing precise and accurate answers to user inquiries, minimizing the delays inherent in traditional search methods. Additionally, the system enhances user experience by offering a user-friendly interface with quick response times, making information retrieval more intuitive. The project also focuses on improving internal knowledge flow by facilitating better communication and collaboration across departments. With an emphasis on scalability, the solution is designed to be adaptable to various sectors, ensuring widespread applicability. By continuously learning and adapting to new data, the system will provide up-to-date information, reducing reliance on manual updates and minimizing human error. Ultimately, this innovation aims to significantly enhance productivity, support effective decision-making, and offer a competitive advantage to organizations through the use of AI-driven knowledge management solutions.

Keywords: Knowledge Library, Large Language Models, AI, Real-time Information Retrieval, Decision-making


References

Zhong, L., Wu, J., Li, Q., Peng, H., & Wu, X. (2023). A comprehensive survey on automatic knowledge graph construction. ACM Computing Surveys, 56(4), 1-62. DOI: https://doi.org/10.1145/3618295

Thanachawengsakul, N., Wannapiroon, P., & Nilsook, P. (2019). The Knowledge Repository Management System Architecture of Digital Knowledge Engineering using Machine Learning to Promote Software Engineering Competencies. International Journal of Emerging Technologies in Learning, 14(12). DOI: https://doi.org/10.3991/ijet.v14i12.10444

Wang, H., Xu, Z., Fujita, H., & Liu, S. (2016). Towards felicitous decision making: An overview on challenges and trends of Big Data. Information Sciences, 367, 747-765. DOI: https://doi.org/10.1016/j.ins.2016.07.007

Walker, W. H., & Kintsch, W. (1985). Automatic and strategic aspects of knowledge retrieval. Cognitive Science, 9(2), 261-283. DOI: https://doi.org/10.1016/S0364-0213(85)80016-1

Martin, P., & Eklund, P. W. (2000). Knowledge retrieval and the world wide web. Ieee Intelligent Systems and Their Applications, 15(3), 18-25. DOI: https://doi.org/10.1109/5254.846281

Oskooei, A. R., Babacan, M. S., Yağcı, E., Alptekin, Ç., & Buğday, A. (2024). Beyond synthetic benchmarks: Assessing recent LLMs for code generation. The 14th International Workshop on Computer Science and Engineering (WCSE 2024), 290-296. Phuket Island, Thailand.

Long, X., Zeng, J., Meng, F., Ma, Z., Zhang, K., Zhou, B., & Zhou, J. (2024, March). Generative multi-modal knowledge retrieval with large language models. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 38, No. 17, pp. 18733-18741). DOI: https://doi.org/10.1609/aaai.v38i17.29837

Zhu, Y., Yuan, H., Wang, S., Liu, J., Liu, W., Deng, C., ... & Wen, J. R. (2023). Large language models for information retrieval: A survey. arXiv preprint arXiv:2308.07107.

Abdalla, H. B., Ahmed, A. M., & Al Sibahee, M. A. (2020). Optimization driven MapReduce framework for indexing and retrieval of big data. KSII Transactions on Internet and Information Systems (TIIS), 14(5), 1886-1908. DOI: https://doi.org/10.3837/tiis.2020.05.002

Oskooei, A. R. (2024). On the use of data parallelism technologies for implementing statistical analysis functions. The 14th International Workshop on Computer Science and Engineering (WCSE 2024), 94-102. Phuket Island, Thailand.

Zhang, Y., Cao, T., Li, S., Tian, X., Yuan, L., Jia, H., & Vasilakos, A. V. (2016). Parallel processing systems for big data: a survey. Proceedings of the IEEE, 104(11), 2114-2136. DOI: https://doi.org/10.1109/JPROC.2016.2591592

Rafieioskouei, A., Rogale, K., Dibavar, A. S., Mahmoudi, M., & Bonakdarpour, B. (2024). Causality analysis of protein corona composition: phosphatidylcholine-enhances plasma proteome profiling by proteomics. bioRxiv, 2024-09. DOI: https://doi.org/10.1101/2024.09.10.612356

Marwala, T. (2015). Causality, correlation and artificial intelligence for rational decision making. World Scientific. DOI: https://doi.org/10.1142/9356

Rafieioskouei, A., & Bonakdarpour, B. (2024). Efficient Discovery of Actual Causality Using Abstraction Refinement. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 43(11), 4274-4285. DOI: https://doi.org/10.1109/TCAD.2024.3448299

Raharjana, I. K., Siahaan, D., & Fatichah, C. (2021). User stories and natural language processing: A systematic literature review. IEEE access, 9, 53811-53826. DOI: https://doi.org/10.1109/ACCESS.2021.3070606

Planas, E., Daniel, G., Brambilla, M., & Cabot, J. (2021). Towards a model-driven approach for multiexperience AI-based user interfaces. Software and Systems Modeling, 20(4), 997-1009. DOI: https://doi.org/10.1007/s10270-021-00904-y

de Souza Alves, T., de Oliveira, C. S., Sanin, C., & Szczerbicki, E. (2018). From knowledge based vision systems to cognitive vision systems: a review. Procedia Computer Science, 126, 1855-1864. DOI: https://doi.org/10.1016/j.procs.2018.08.077

Ruíz, L. M., Pueyo, P. P., Mateo-Fornés, J., Mayoral, J. V., & Tehàs, F. S. (2022). Autoscaling pods on an on-premise Kubernetes infrastructure QoS-aware. IEEE Access, 10, 33083-33094. DOI: https://doi.org/10.1109/ACCESS.2022.3158743

Zhong, Z., Xu, M., Rodriguez, M. A., Xu, C., & Buyya, R. (2022). Machine learning-based orchestration of containers: A taxonomy and future directions. ACM Computing Surveys (CSUR), 54(10s), 1-35. DOI: https://doi.org/10.1145/3510415

Rafiei Oskooei, A., Yahsi, E., Sungur, M., & S. Aktas, M. (2024, July). Can One Model Fit All? An Exploration of Wav2Lip’s Lip-Syncing Generalizability Across Culturally Distinct Languages. In International Conference on Computational Science and Its Applications (pp. 149-164). Cham: Springer Nature Switzerland. DOI: https://doi.org/10.1007/978-3-031-65282-0_10

Rahman, M. M., Balakrishnan, D., Murthy, D., Kutlu, M., & Lease, M. (2021). An information retrieval approach to building datasets for hate speech detection. arXiv preprint arXiv:2106.09775.

Guveyi, E., Aktas, M. S., & Kalipsiz, O. (2020). Human factor on software quality: A systematic literature review. In O. Gervasi, B. Murgante, S. Misra, C. Garau, I. Blečić, D. Taniar, B. O. Apduhan, A. M. A. C. Rocha, E. Tarantino, C. M. Torre, & Y. Karaca (Eds.), Computational Science and Its Applications – ICCSA 2020. Lecture Notes in Computer Science (pp. 918–930). Springer. DOI: https://doi.org/10.1007/978-3-030-58811-3_65

Aktas, M. S., & Kapdan, M. (2016). Structural code clone detection methodology using software metrics. International Journal of Software Engineering and Knowledge Engineering, 26(2), 307–332. DOI: https://doi.org/10.1142/S0218194016500133

Oz, M., Kaya, C., Olmezogullari, E., & Aktas, M. S. (2021). On the use of generative deep learning approaches for generating hidden test scripts. International Journal of Software Engineering and Knowledge Engineering, 31(10), 1447–1468. DOI: https://doi.org/10.1142/S0218194021500480

Oguz, R.F., Oz, M., Olmezogullari, E., Aktas, M. S. (2022). Extracting Information from Large Scale Graph Data: Case Study on Automated UI Testing, Euro-Par 2021: Parallel Processing Workshops, LNCS,volume 13098. DOI: https://doi.org/10.1007/978-3-031-06156-1_29

Uzun-Per, M., Can, A. B., Gurel, A. V., & Aktas, M. S. (2021). Big data testing framework for recommendation systems in e-science and e-commerce domains. 2021 IEEE International Conference on Big Data (Big Data), 2021. DOI: https://doi.org/10.1109/BigData52589.2021.9672082

Erdem, I., Oguz, R. F., Olmezogullari, E., & Aktas, M. S. (2021). Test script generation based on hidden Markov models learning from user browsing behaviors 2021 IEEE International Conference on Big Data (Big Data), 2021. DOI: https://doi.org/10.1109/BigData52589.2021.9671312

Düzen, Z., & Aktas, M. S. (2016). An approach to hybrid personalized recommender systems. 2016 International Symposium on INnovations in Intelligent SysTems and Applications (INISTA), 2-5 Ağustos 2016, Sinaia, Romanya. DOI: https://doi.org/10.1109/INISTA.2016.7571865

Uzun-Per, M., Gurel, A. V., Can, A. B., & Aktas, M. S. (2022). Scalable recommendation systems based on finding similar items and sequences. Concurrency and Computation: Practice and Experience, 34(20). DOI: https://doi.org/10.1002/cpe.6841

Yildiz, B. (2022, September). Enhancing image resolution with generative adversarial networks. In 2022 7th International Conference on Computer Science and Engineering (UBMK) (pp. 104–109). IEEE. DOI: https://doi.org/10.1109/UBMK55850.2022.9919520

Yıldız, B. (2022). Efficient text classification with deep learning on imbalanced data improved with better distribution. Turkish Journal of Science and Technology, 17(1), 89–98. DOI: https://doi.org/10.55525/tjst.1068940

Briman, M. K. H., & Yildiz, B. (2024). Beyond ROUGE: A comprehensive evaluation metric for abstractive summarization leveraging similarity, entailment, and acceptability. International Journal on Artificial Intelligence Tools. DOI: https://doi.org/10.1142/S0218213024500179

Saad, A. M. S. E., & Yildiz, B. (2022, September). Reinforcement learning for intrusion detection. In International Conference on Computing, Intelligence and Data Analytics (pp. 230–243). Cham: Springer International Publishing. DOI: https://doi.org/10.1007/978-3-031-27099-4_18

Haider, U., & Yildiz, B. (2023, December). A novel use of reinforcement learning for elevated click-through rate in online advertising. In 2023 International Conference on Computational Science and Computational Intelligence (CSCI) (pp. 64–70). IEEE. DOI: https://doi.org/10.1109/CSCI62032.2023.00017

Yildiz, B. (2021). Optimizing bitmap index encoding for high performance queries. Concurrency and Computation: Practice and Experience, 33(18), e5943. DOI: https://doi.org/10.1002/cpe.5943

Yildiz, B., & Tezgider, M. (2020). Learning quality improved word embedding with assessment of hyperparameters. In Euro-Par 2019: Parallel Processing Workshops: Euro-Par 2019 International Workshops, Göttingen, Germany, August 26–30, 2019, Revised Selected Papers 25 (pp. 506–518). Cham: Springer International Publishing. DOI: https://doi.org/10.1007/978-3-030-48340-1_39

Most read articles by the same author(s)