|
| 1 | +--- |
| 2 | +title: "Midway Report: LINQS - Autograder (LLM Detector)" |
| 3 | +subtitle: "Halfway through the work!" |
| 4 | +summary: "Midterm progress report on my GSoC'25 project with OSPO." |
| 5 | +authors: |
| 6 | + - anvichip |
| 7 | +tags: ["osre25"] |
| 8 | +categories: ["Artificial Intelligence","Machine Learning", "LLMs"] |
| 9 | +date: 2025-07-31 |
| 10 | +lastmod: 2025-07-31 |
| 11 | +featured: true |
| 12 | +draft: false |
| 13 | +--- |
| 14 | + |
| 15 | +Hello everyone, I’m Anvi Kohli and in this blog post, I’ll be sharing my journey as a GSoC contributor. |
| 16 | +This summer I am contributing to the [LINQS Autograder project](https://ucsc-ospo.github.io/project/osre25/ucsc/autograder/) under the mentorship of Eriq Augustine and Lucas Ellenberger. |
| 17 | +The goal of my project is to build a tool that can detect code generated with AI. |
| 18 | +You can read my proposal here: [Proposal](https://summerofcode.withgoogle.com/programs/2025/projects/jxBUpvoM). |
| 19 | +My first blog is available here: [Blog Post 1](https://ucsc-ospo.github.io/report/osre25/ucsc/autograder/). |
| 20 | + |
| 21 | +## Project Overview |
| 22 | + |
| 23 | +The Autograder Server is an open source project that automatically grades programming assignments in real-time. |
| 24 | +The project includes support for a variety of programming languages for evaluation. |
| 25 | +Due to the rise of tools like GitHub Copilot and ChatGPT, students are increasingly relying on AI to complete their coding assignments. |
| 26 | +This poses a problem as it becomes challenging to uphold fairness in grading and ensure that students are learning. |
| 27 | +Our project aims to address this issue by creating a system that provides a confidence score to indicate that a piece of code was written by an AI tool. |
| 28 | + |
| 29 | +## Progress, Challenges, & Learnings |
| 30 | + |
| 31 | +### Exploration of Existing Tools and Systems |
| 32 | + |
| 33 | +My mentor, Eriq Augustine, advised me to begin with simpler methodologies before progressing to more complex ones. |
| 34 | +There are a several possible approaches to detect AI generated code including training models from scratch, designing custom detection algorithms, and using and adapting existing open-source tools. |
| 35 | +Since training a model from scratch requires an enormous amount of training data to be curated first, in the interest of time, we chose to begin by exploring pre-existing solutions and evaluating their performance. |
| 36 | +So first off, I conducted an in-depth exploration of open-source repositories that detect AI-generated code. |
| 37 | +By building upon existing open source solutions, we can focus on enhancing the capabilities of pre-built tools and fine-tuning models. |
| 38 | +Training these models on large, diverse datasets can make the models more accurate, robust, and adaptable. |
| 39 | + |
| 40 | +Exploring open source solutions helped me gain an understanding of the current work and ongoing efforts in the detection of AI-generated code. |
| 41 | +This exploration helped me identify the gaps that remain in the current tools and where there's room for improvement. |
| 42 | + |
| 43 | +### Transfer Learning |
| 44 | + |
| 45 | +While exploring existing tools and research papers, I found that transfer learning has shown promising results in the detection of AI-generated code. |
| 46 | +For example, many studies fine-tuned pre-trained models like CodeBERT on labeled datasets containing AI and human-written code. |
| 47 | +Building on this, I curated a collection of publicly available datasets that could be used for this purpose. |
| 48 | +However, during this process, I noticed that open-source, relevant, and good quality datasets are limited. |
| 49 | +They also vary widely in format, language coverage, and overall quality. |
| 50 | +Some focus on a single programming language, while others span multiple languages. |
| 51 | +Often, these datasets also lack sufficient examples. |
| 52 | +By standardizing these disorganized resources, we can create a comprehensive, multi-language dataset suitable for AI code detection. |
| 53 | + |
| 54 | +Currently, I’m working on fine-tuning these models using the open source datasets and evaluating their effectiveness in classifying AI-generated from human written code. |
| 55 | + |
| 56 | +### Contributing to Autograder Repository |
| 57 | + |
| 58 | +To help me get familiar with our existing codebase and gain hands-on experience with the Go programming language, my mentor assigned me an [Open Issue](https://github.com/edulinq/autograder-server/issues/141) of the repo. |
| 59 | +Here is my progress on the same: [PR#194](https://github.com/edulinq/autograder-server/pull/194). |
| 60 | + |
| 61 | +As mentioned, the Autograder is a tool used to evaluate programming assignments. |
| 62 | +One of the features of the autograder server is it's ability to provide code analysis across a large number of code submissions. |
| 63 | +It leverages source code plagiarism detection engines like [JPlag](https://helmholtz.software/software/jplag) and [Dolos](https://dolos.ugent.be/) to analyze all submissions for an assignment. |
| 64 | +This pull request introduces the ability to pass custom arguments to these engines, allowing more control and flexibility in how similarity is calculated. |
| 65 | + |
| 66 | +As someone with no prior experience with either contributing to open source or in coding in the Go programming language, I was consistently encouraged and supported by my mentors, Lucas and Eriq, who gave me valuable guidance on writing cleaner and more efficient code. |
| 67 | +This experience taught me about the importance of code quality and maintainability in production-level open-source collaborative projects. |
| 68 | + |
| 69 | +## Learning |
| 70 | + |
| 71 | +Over the past two months, my involvement in this project has been a period of immense growth and learning. |
| 72 | +With the constant support and guidance of my mentors - Eriq Augustine and Lucas Ellenberger, I’ve had the chance to reflect on my skills, identify areas for improvement, and actively work on them. |
| 73 | +One of the biggest takeaways for me has been understanding the importance of writing clean, readable code - something I hadn’t fully appreciated before. |
| 74 | +Their guidance has not only made me a better developer but have also shaped my growth more holistically. |
| 75 | +I’ve started paying closer attention to deadlines, communicating more thoughtfully, and ensuring my work is both thorough and reliable. |
| 76 | +All in all, GSoC’25 has definitely proved to be a valuable experience for me. |
0 commit comments