Project goal: To develop a web app using an existing open-source LLM model with Jenkins usage data collected for domain-specific Jenkins knowledge to be fine-tuned
Skills to study/improve: Python, JavaScript/TypeScript, React.js, LLM, AI/ML, Jenkins, Ollama, LangChain, UI, Infra statistics, Data Analytics
This full-stack project focuses on a proof-of-concept (PoC) idea to fine-tune an existing open-source LLM model (such as Llama 2) with domain-specific Jenkins data to be compiled, wrangled, and processed by the contributor as a part of an AI-driven application, to develop a minimalistic UI for the user to interact with the LLM as a complete end-to-end product. The main source of raw data will be the publicly available ci.jenkins.io datasets. It is expected that there will be two parts of the project: the first part which is to analyze the ci.jenkins.io data using machine learning techniques, and the second part which is to "ingest" the knowledge gained via part one to fine-tune an existing LLM model to help with any infra-related troubleshooting tasks. The contributor will get to be involved in every step of the application development process, from data collection, wrangling, and processing to fine-tuning the model and developing the UI. Unlike the very similar GSoC 2024 LLM counterpart, this project is very research-focused and data-driven, and will be a lot more difficult to achieve a successful outcome. We will build on the completed work available via GitHub at https://github.com/jenkins-infra/Enhancing-LLM-with-Jenkins-Knowledge/. It is expected that some benchmarking will be conducted by the GSoC contributor selected towards the end of the project to gauge the effectiveness of the work completed, which will also serve as a basis for future similar projects.
Strategy: Test failure analysis based on the test data from ci.jenkins.io
Help the user with failure diagnosis
Is the failure due to infra?
Is the failure due to a code change?
Is the failure due to an unreliable test (“flaky test”)?
Sample repositories and use cases
Jenkins core
Jenkins acceptance test harness
Jenkins plugin BOM
Details to be clarified interactively, together with the mentors, during the Contributor Application drafting phase.
Become familiarized with the flow of the GSoC 2024 LLM project. We will reference the source code from the GSoC 2024 LLM project at its official repo on GitHub.