In our series on SAST benchmarks, we explored the significance of benchmarks in tracking the evolution of our SAST capabilities. If you've been following along, you've observed our commitment to transparency, as we unveiled Sonar's scores on the Top 3 Java and C# SAST benchmarks –sharing the ground truth and shedding light on expected and unexpected issues.
But here's the drumroll moment - today marks the grand finale with Python, the last language on our 2023 checklist! Just as we've done for Java and C#, we're excited to share not only how Sonar performs on these Python benchmarks but also the ground truth corresponding to the list of expected and not-so-expected issues.
Our approach
We've approached the selection of Python SAST benchmarks with the same meticulous method. We looked at 109 projects available on GitHub related to SAST benchmarks. Out of these, we selected these 3 Python projects:
Our findings
At Sonar, we consider that a good SAST solution should have a True Positive Rate of 90% and a False Discovery Rate lower than 10%.
Let's now proceed to share the scores of Sonar against these benchmarks:
You'll notice that the outcomes are quite promising and generally align closely with our 90% True Positive Rate (TPR) target. Our commitment remains steadfast, and we're dedicated to continually enhancing our SAST engine. The goal is to consistently deliver results that are both precise and actionable.
Our computation
We said it in part one of this blog series: SAST vendors make plenty of claims but rarely provide anything to reproduce or substantiate their results. At Sonar, we want to change that. To replicate these results, you can access the ground truth provided in the sonar-benchmarks-scores repository. If you try to replicate it, we recommend utilizing the most recent version of the SonarQube Server Developer Edition.
Final word
Through revealing the ground truths and illustrating Sonar's performance on these SAST benchmarks, our aim is to foster transparency and empower companies to make informed decisions regarding their SAST solutions. We firmly believe that by openly sharing metrics such as True Positive Rate (TPR), False Discovery Rate (FDR), and the ground truths, users will develop a clearer comprehension of the efficacy and precision of Sonar's security analyzers.
Wrapping up this blog series, here's a brief overview of Sonar's average for the three programming languages we covered in 2023:
- Java: 93% TPR (on average)
- C#: 90% TPR (on average)
- Python: 92% TPR (on average)
It's been an exciting journey, and we were thrilled to share these results with you. Stay tuned as we evaluate new detection capabilities in the world of JavaScript and TypeScript in 2024!
Alex