Improve training data quality for coding LLMs
Large language models are powerful but inherit flaws from their training data. Sonar is working with AI labs and enterprise organizations to train specialized coding models optimized for cost, performance, quality, and security.
To learn more about our methodology and findings:
- Explore our fine-tuned version of GPT-OSS, optimized for high-quality Java code generation. By using SonarSweep to optimize the training dataset, the resulting model successfully generates code with a 41% reduction in bugs and security vulnerabilities compared to the base model—all without sacrificing functional coding performance.
- Watch our video guide, How to Improve AI-Generated Code with SonarSweep | Sonar
- Read our research on evaluating LLMs, "Assessing the Quality and Security of AI-Generated Code: A Quantitative Analysis"
