Close Menu
Newstech24.com
  • Home
  • News
  • Arabic News
  • Technology
  • Economy & Business
  • Sports News
What's Hot

UK’s Pornhub Ban: Has Your Access Disappeared?

06/02/2026

Frankfurt vs. Tottenham: The Ultimate Pre-Match Intel Briefing

06/02/2026

Autonomous Showdown: Waymo’s Price Plunge Rattles Uber

06/02/2026
Facebook Tumblr
Friday, February 6
Facebook X (Twitter) Instagram
Newstech24.com
  • Home
  • News
  • Arabic News
  • Technology
  • Economy & Business
  • Sports News
Newstech24.com
Home - Technology - A brand new AI coding problem simply revealed its first outcomes – they usually aren’t fairly
Technology

A brand new AI coding problem simply revealed its first outcomes – they usually aren’t fairly

By Admin24/07/2025No Comments3 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr Email
Benchmark in talks to lead Series A for Greptile, valuing AI-code reviewer at $180M, sources say
Share
Facebook Twitter LinkedIn Pinterest Email

[ad_1]

A brand new AI coding problem has revealed its first winner — and set a brand new bar for AI-powered software program engineers. 

On Wednesday at 5pm PST, the nonprofit Laude Institute introduced the primary winner of the Ok Prize, a multi-round AI coding problem launched by Databricks and Perplexity co-founder Andy Konwinski. The winner was a Brazilian immediate engineer named Eduardo Rocha de Andrade, who will obtain $50,000 for the prize. However extra stunning than the win was his closing rating: he gained with right solutions to simply 7.5% of the questions on the check.

“We’re glad we constructed a benchmark that’s truly arduous,” stated Konwinski. “Benchmarks ought to be arduous in the event that they’re going to matter,” he continued, including: “Scores could be totally different if the massive labs had entered with their largest fashions. However that’s form of the purpose. Ok Prize runs offline with restricted compute, so it favors smaller and open fashions. I like that. It ranges the enjoying area.”

Konwinski has pledged $1 million to the primary open-source mannequin that may rating larger than 90% on the check.

Much like the well-known SWE-Bench system, the Ok Prize checks fashions in opposition to flagged points from GitHub as a check of how properly fashions can cope with real-world programming issues. However whereas SWE-Bench is predicated on a set set of issues that fashions can prepare in opposition to, the Ok Prize is designed as a “contamination-free model of SWE-Bench,” utilizing a timed entry system to protect in opposition to any benchmark-specific coaching. For spherical one, fashions had been due by March twelfth. The Ok Prize organizers then constructed the check utilizing solely GitHub points flagged after that date.

The 7.5% high rating stands in marked distinction to SWE-Bench itself, which at present exhibits a 75% high rating on its simpler ‘Verified’ check and 34% on its more durable ‘Full’ check. Konwinski nonetheless isn’t certain whether or not the disparity is because of contamination on SWE-Bench or simply the problem of accumulating new points from GitHub, however he expects the Ok Prize venture to reply the query quickly.

“As we get extra runs of the factor, we’ll have a greater sense,” he advised TechCrunch, “as a result of we anticipate individuals to adapt to the dynamics of competing on this each few months.”

Techcrunch occasion

San Francisco
|
October 27-29, 2025

It would seem to be an odd place to fall brief, given the big selection of AI coding instruments already publicly obtainable – however with benchmarks changing into too straightforward, many critics see initiatives just like the Ok Prize as a needed step towards fixing AI’s rising analysis downside.

“I’m fairly bullish about constructing new checks for present benchmarks,” says Princeton researcher Sayash Kapoor, who put ahead an analogous concept in a latest paper. “With out such experiments, we are able to’t truly inform if the problem is contamination, and even simply focusing on the SWE-Bench leaderboard with a human within the loop.”

For Konwinski, it’s not only a higher benchmark, however an open problem to the remainder of the trade. “In the event you hearken to the hype, it’s like we ought to be seeing AI docs and AI legal professionals and AI software program engineers, and that’s simply not true,” he says. “If we are able to’t even get greater than 10% on a contamination free SWE-Bench, that’s the fact examine for me.”

[ad_2]
{content material}

Supply: {feed_title}

arent Challenge coding Pretty published Results
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Admin
  • Website

Related Posts

UK’s Pornhub Ban: Has Your Access Disappeared?

06/02/2026

Autonomous Showdown: Waymo’s Price Plunge Rattles Uber

06/02/2026

WhatsApp’s Cyber Citadel: Stricter Security Now Defends Against Attacks

06/02/2026
Leave A Reply Cancel Reply

Don't Miss
Technology
4 Mins Read

UK’s Pornhub Ban: Has Your Access Disappeared?

By Admin06/02/20264 Mins Read

## Pornhub’s Parent Company, Aylo, Blocks UK Access Amidst Online Safety Act Dispute In a…

Frankfurt vs. Tottenham: The Ultimate Pre-Match Intel Briefing

06/02/2026

Autonomous Showdown: Waymo’s Price Plunge Rattles Uber

06/02/2026

Volkswagen’s ID.4 Burns Out: 44,000 SUVs Recalled Over Battery Fire Risk

06/02/2026

WhatsApp’s Cyber Citadel: Stricter Security Now Defends Against Attacks

06/02/2026

OpenAI Unveils Prism: The AI Workspace for Breakthrough Science

06/02/2026

Forging Tomorrow’s Fleet: MOD’s Strategic Maritime Build

06/02/2026

Europe’s Digital Divorce: Untethering from Weaponized US Tech

06/02/2026

The AI Paradox: CEOs Slam ICE, Applaud Trump

06/02/2026

Britain Forges Project Goshawk: Igniting Sky Supremacy

06/02/2026
Advertisement
About Us
About Us

NewsTech24 is your premier digital news destination, delivering breaking updates, in-depth analysis, and real-time coverage across sports, technology, global economics, and the Arab world. We pride ourselves on accuracy, speed, and unbiased reporting, keeping you informed 24/7. Whether it’s the latest tech innovations, market trends, sports highlights, or key developments in the Middle East—NewsTech24 bridges the gap between news and insight.

Company
  • Home
  • About Us
  • Contact Us
  • Privacy Policy
  • Disclaimer
  • Terms Of Use
Latest Posts

UK’s Pornhub Ban: Has Your Access Disappeared?

06/02/2026

Frankfurt vs. Tottenham: The Ultimate Pre-Match Intel Briefing

06/02/2026

Autonomous Showdown: Waymo’s Price Plunge Rattles Uber

06/02/2026

Volkswagen’s ID.4 Burns Out: 44,000 SUVs Recalled Over Battery Fire Risk

06/02/2026

WhatsApp’s Cyber Citadel: Stricter Security Now Defends Against Attacks

06/02/2026
Newstech24.com
Facebook X (Twitter) Tumblr Threads RSS
  • Home
  • News
  • Arabic News
  • Technology
  • Economy & Business
  • Sports News
© 2026 ThemeSphere. Designed by ThemeSphere.

Type above and press Enter to search. Press Esc to cancel.