The YOLO26 MLX Build Challenge — May 2026

jravinder · May 25, 2026, 4:58am

Project: RefereAI MLX — private venue intelligence for amateur sports
Track: Useful / Consumer / Sports
Team: Ravinder Jilkapally

RefereAI MLX turns ordinary phone footage into a local sports intelligence loop. Phones act as cameras. A Mac runs YOLO26 MLX locally. The system watches the scene, identifies the sport, renders overlays, records the session, and sends every result into a review console where feedback becomes part of the improvement loop.

Repo: https://github.com/jravinder/refereai-mlx

App: https://refereai.xyz

Demo walkthrough: Demo — RefereAI MLX

Social post: https://x.com/jravinder/status/2058877525330162073

Hardware: M2 MacBook Air

Model variant: yolo26n (MLX, .npz)

Registered: yes — registration and acceptance form completed

Amateur sports happen in messy real places: YMCA gyms, school courts, parks, open runs, weekend tournaments, and family games. RefereAI is built for that environment. It works with phone footage, live or recorded, and turns it into a reviewable venue timeline: sport, people, objects, score candidates, commentary, overlays, replay, and human feedback.

What it does

phone/browser capture over LAN or Tailscale
local YOLO26 MLX inference on Apple Silicon
unknown → known sport detection
player/object tracking
sport-aware overlays
team/court/color hints
scoreboard OCR probes
commentary and referee-call surfaces
recording and replay
frame logs and summary artifacts
looping evolution wall across analysis versions
human-in-the-loop review console
private family viewing concept with overlay toggles

shep · May 25, 2026, 5:20am

I built YOLO26-ANE for the webAI YOLO26 MLX Build Challenge, pushing YOLO onto Apple’s Neural Engine so Apple Silicon can see without tying up the graphics chip.

webAI released YOLO26 running locally on Mac through Apple’s graphics chip.

I tried a different path:

Can YOLO26 run on Apple’s dedicated Neural Engine instead?

Turns out: yes.

On my MacBook Pro:

webAI’s version: 155 frames per second
My Neural Engine version: 448 frames per second
Best focused run: 514 frames per second

Then I checked that it was still seeing the same things:

5 out of 5 detections matched the reference model on the same image.

The cool thing is that it’s not just a speed upgrade. It means local camera perception can run on the Neural Engine while the graphics chip stays free for other work: visual interfaces, local language models, agents, or whatever else the app needs.

The camera can see.
The computer can still think.

Neural engine handles vision. Graphics chip stays free for the rest of the app.

Project: YOLO26-ANE
Track: Enterprise
Team: Shep Bryan, solo

Repo: GitHub - elsheppo/yolo26-ane · GitHub
Demo video: https://youtu.be/yLLVdxzUPRc

shep · May 25, 2026, 5:21am

Social post: https://x.com/ShepBryan/status/2058775064506405103

Zhen_Song · May 25, 2026, 6:50am

Project: SoPilot - Private SOP video checking without VLM fine-tuning bills

Track: Enterprise

Team: Zhen Song

Github Repo: GitHub - robot007/SoPilot: One-liner SoPilot is a local Mac-based SOP video checker that uses YOLO plus a novel rule engine to verify workflow steps, cutting the need for expensive VLM fine-tuning. · GitHub

1 min video: https://youtu.be/ESWiYeAV_CQ

SoPilot Project Release note: GitHub - robot007/SoPilot: One-liner SoPilot is a local Mac-based SOP video checker that uses YOLO plus a novel rule engine to verify workflow steps, cutting the need for expensive VLM fine-tuning. · GitHub

SOUP Engine Release Note: SoPilot/SOUP.md at main · robot007/SoPilot · GitHub

A Standard Operating Procedure (SOP) is a step-by-step instruction for completing a task safely, consistently, and correctly. In healthcare, manufacturing, labs, and field service, even small SOP mistakes can lead to quality issues, safety risks, or costly rework.

I’m excited to share my hackathon project: SoPilot — a local-first SOP video checker for physical workflows.

At the core is a proposed SOUP Engine: a hybrid YOLO + rule-engine architecture that turns video detections into auditable workflow decisions, instead of relying only on black-box VLM judgment.

For the demo, I built a blood-pressure-monitor SOP checker that verifies whether the sleeve is rolled up, the cuff is placed correctly on the upper arm, and the workflow follows the required sequence before measurement.

Key ideas:

Run YOLO26 MLX locally on Apple Silicon

Convert detections into structured evidence: timestamp, label, bounding box, confidence

Use the SOUP Engine to check step order, timing, geometry, and required actions

Keep sensitive video local by default

Use cloud VLMs only as optional advisory help for ambiguous cases

Potentially reduce development and inference cost by up to 70% compared with VLM-heavy approaches

Detailed architecture and cost analysis are included in the GitHub repo

Many thanks to: Issac, Calla.

klgilbert · May 25, 2026, 11:46am

I didn’t realize I had to submit here too. I just followed the “What to submit” section. Details in the GitHub README

Project: Autonomous Rotato
Track: Austin-flavored
Repo: GitHub - klgilbert/yolo-mlx · GitHub
Demo: https://www.youtube.com/watch?v=5St4Z05xVKU

tim · May 25, 2026, 9:07pm

Update: We are implementing head direction/pointing tracking. This allows us to bypass the physical ‘walking into a zone’ trigger, using head orientation analysis to determine when a user is focusing on a specific object.

okigan · May 26, 2026, 1:00am

forgot to include link to social: https://x.com/okigan/status/2058568649053700453

Topic		Replies	Views
YOLO26 MLX Build Challenge — Winners Build Challenges	0	40	May 27, 2026
Getting Started Guide — YOLO26 MLX Build Challenge Build Challenges	0	87	May 15, 2026
Who's in? — YOLO26 MLX Build Challenge Build Challenges	2	75	May 19, 2026
Terms & Conditions — YOLO26 MLX Build Challenge Build Challenges	0	50	May 15, 2026
Q&A — YOLO26 MLX Build Challenge Build Challenges	6	66	May 24, 2026

The YOLO26 MLX Build Challenge — May 2026

Related topics