The YOLO26 MLX Build Challenge — May 2026

Project: RefereAI MLX — private venue intelligence for amateur sports
Track: Useful / Consumer / Sports
Team: Ravinder Jilkapally

RefereAI MLX turns ordinary phone footage into a local sports intelligence loop. Phones act as cameras. A Mac runs YOLO26 MLX locally. The system watches the scene, identifies the sport, renders overlays, records the session, and sends every result into a review console where feedback becomes part of the improvement loop.

Repo: https://github.com/jravinder/refereai-mlx

App: https://refereai.xyz

Demo walkthrough: Demo — RefereAI MLX

Social post: https://x.com/jravinder/status/2058877525330162073

Hardware: M2 MacBook Air

Model variant: yolo26n (MLX, .npz)

Registered: yes — registration and acceptance form completed

Amateur sports happen in messy real places: YMCA gyms, school courts, parks, open runs, weekend tournaments, and family games. RefereAI is built for that environment. It works with phone footage, live or recorded, and turns it into a reviewable venue timeline: sport, people, objects, score candidates, commentary, overlays, replay, and human feedback.

What it does

  • phone/browser capture over LAN or Tailscale

  • local YOLO26 MLX inference on Apple Silicon

  • unknown → known sport detection

  • player/object tracking

  • sport-aware overlays

  • team/court/color hints

  • scoreboard OCR probes

  • commentary and referee-call surfaces

  • recording and replay

  • frame logs and summary artifacts

  • looping evolution wall across analysis versions

  • human-in-the-loop review console

  • private family viewing concept with overlay toggles

1 Like

I built YOLO26-ANE for the webAI YOLO26 MLX Build Challenge, pushing YOLO onto Apple’s Neural Engine so Apple Silicon can see without tying up the graphics chip.


webAI released YOLO26 running locally on Mac through Apple’s graphics chip.

I tried a different path:

Can YOLO26 run on Apple’s dedicated Neural Engine instead?

Turns out: yes.

On my MacBook Pro:

webAI’s version: 155 frames per second
My Neural Engine version: 448 frames per second
Best focused run: 514 frames per second

Then I checked that it was still seeing the same things:

5 out of 5 detections matched the reference model on the same image.

The cool thing is that it’s not just a speed upgrade. It means local camera perception can run on the Neural Engine while the graphics chip stays free for other work: visual interfaces, local language models, agents, or whatever else the app needs.

The camera can see.
The computer can still think.

Neural engine handles vision. Graphics chip stays free for the rest of the app.


Project: YOLO26-ANE
Track: Enterprise
Team: Shep Bryan, solo

Repo: GitHub - elsheppo/yolo26-ane · GitHub
Demo video: https://youtu.be/yLLVdxzUPRc

1 Like

Social post: https://x.com/ShepBryan/status/2058775064506405103

1 Like

Project: SoPilot - Private SOP video checking without VLM fine-tuning bills

Track: Enterprise

Team: Zhen Song

Github Repo: GitHub - robot007/SoPilot: One-liner SoPilot is a local Mac-based SOP video checker that uses YOLO plus a novel rule engine to verify workflow steps, cutting the need for expensive VLM fine-tuning. · GitHub

1 min video: https://youtu.be/ESWiYeAV_CQ

SoPilot Project Release note: GitHub - robot007/SoPilot: One-liner SoPilot is a local Mac-based SOP video checker that uses YOLO plus a novel rule engine to verify workflow steps, cutting the need for expensive VLM fine-tuning. · GitHub

SOUP Engine Release Note: SoPilot/SOUP.md at main · robot007/SoPilot · GitHub

A Standard Operating Procedure (SOP) is a step-by-step instruction for completing a task safely, consistently, and correctly. In healthcare, manufacturing, labs, and field service, even small SOP mistakes can lead to quality issues, safety risks, or costly rework.

I’m excited to share my hackathon project: SoPilot — a local-first SOP video checker for physical workflows.

At the core is a proposed SOUP Engine: a hybrid YOLO + rule-engine architecture that turns video detections into auditable workflow decisions, instead of relying only on black-box VLM judgment.

For the demo, I built a blood-pressure-monitor SOP checker that verifies whether the sleeve is rolled up, the cuff is placed correctly on the upper arm, and the workflow follows the required sequence before measurement.

Key ideas:

:white_check_mark: Run YOLO26 MLX locally on Apple Silicon

:white_check_mark: Convert detections into structured evidence: timestamp, label, bounding box, confidence

:white_check_mark: Use the SOUP Engine to check step order, timing, geometry, and required actions

:white_check_mark: Keep sensitive video local by default

:white_check_mark: Use cloud VLMs only as optional advisory help for ambiguous cases

:white_check_mark: Potentially reduce development and inference cost by up to 70% compared with VLM-heavy approaches

:white_check_mark: Detailed architecture and cost analysis are included in the GitHub repo

Many thanks to: Issac, Calla.

1 Like

I didn’t realize I had to submit here too. I just followed the “What to submit” section. Details in the GitHub README

Project: Autonomous Rotato
Track: Austin-flavored
Repo: GitHub - klgilbert/yolo-mlx · GitHub
Demo: https://www.youtube.com/watch?v=5St4Z05xVKU

1 Like

Update: We are implementing head direction/pointing tracking. This allows us to bypass the physical ‘walking into a zone’ trigger, using head orientation analysis to determine when a user is focusing on a specific object.

1 Like

forgot to include link to social: https://x.com/okigan/status/2058568649053700453

1 Like