SIGNAL GRIDv0.1

Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview

1 sources1 storiesFirst seen 4/27/2026Score29Mixed Progress
Single Source
CoverageRecencyEngagementVelocityBignessConfidenceClipability
Bigness
29
Coverage
13
Recency
49
Engagement
38
Velocity
0
Confidence
48
Clipability
60
Polarization
0
Claims
5
Contradictions
0
Breakthrough
50

Sentiment Mix

Positive0%
Neutral100%
Negative0%

Geography

North America

Expert Signals

GodelNumbering

author1 mention

Hacker News

source1 mention

AI-Generated Claims

Generated from linked receipts; click sources for full context.

Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview.

Supported by 1 story

Scored 65.2% vs google's official 47.8%, and the existing top closed source model Junie CLI's 64.3%.Since there are a lot of reports of deliberate cheating on TerminalBench 2.0 lately (https://debugml.github.io/cheating-agents/), I would like to also clarify a few things1.

Supported by 1 story

Absolutely no {agents/skills}.md files were inserted at any point.

Supported by 1 story

The cli agent was run in leaderboard compliant way (no modification of resources or timeouts)3.

Supported by 1 story

The full terminal bench run was done using the fully open source version of the agent, no difference between what is on github and what was run.I was originally going to wait for it to land on the leaderboard, but it has been 8 days and the maintainers do not respond unfortunately (there is a large backlog of the pull requests on their HF) so I decided to post anyways.HF PR: https://huggingface.co/datasets/harborframework/terminal-ben...It is astounding how much the harness matters, based on this and other experiments I...

Supported by 1 story

Related Events

Timeline (1 stories)

Receipts (1)

Bias Snapshot

Center
Left 0%Center 100%Right 0%
Agggithub.com4/27/2026