The YOLO26 MLX Build Challenge — May 2026

The YOLO26 MLX Build Challenge

Build with our open source YOLO26 MLX model. Seven days. On-device. Show us what you’ve got.

Co-hosted by webAI, HackAI, and AITX + Antler

The challenge

We’re giving you 7 days to build a demo using YOLO26 MLX, our open source, Apple Silicon-native object detection model. Pick a track, pick an idea, ship it.

Our co-hosts

This challenge is co-hosted with two of Austin’s strongest AI builder communities:

HackAI: kicking us off on May 18 at Capital Factory during their monthly community session. We’ll be presenting the challenge alongside other Austin AI companies, with open hack and network time after.

AITX: running our deeper technical evening on May 19 at Antler, where Mitch and Fatih from our AI team walk through how we built YOLO26 MLX. Open Q&A and roundtable after.

Whichever event you attend (or both), you’re in the same challenge with the same pool of builders.

Four tracks

Useful: YOLO26 MLX solving a real problem for everyday people.

Examples: a camera that tells you when your dog leaves the room, a real-time fridge inventory, a posture monitor for desk workers, an object finder for “where are my keys,” a plant health detector.

Enterprise: YOLO26 MLX solving a business or industrial problem.

Examples: tracking foot traffic patterns through a retail floor, defect detection on a manufacturing line, flagging PPE violations without naming individuals, fire or smoke detection on industrial cameras, auditing a warehouse for misplaced inventory.

Austin-flavored: YOLO26 MLX solving a problem specific to a place or context in Austin.

Examples: counting how full the line at Franklin Barbecue is from a webcam feed, watching your front porch for package arrivals, alerting you when the H-E-B parking lot hits capacity, identifying which roommate left the dishes in the sink, tagging the cars in your neighborhood to spot patterns.

Wild: YOLO26 MLX doing something intentionally weird, funny, or unexpected.

Examples: roasting your outfit in real time based on what you’re wearing, narrating your life like a David Attenborough documentary based on the room you’re in, generating AI haikus from objects on your desk, an alarm clock that only stops when it detects you holding a coffee, a Tamagotchi that reacts to your screen.

Build whatever you want inside a track. The examples are just to get you started.

Teams

Build solo or with a team of up to 3 people. Looking for teammates? Drop a note in the “Who’s in?” topic.

What to submit

  1. Public GitHub repo with your code. Fork our starter template to get a standard structure with a working YOLO26 MLX inference script. Forking is strongly recommended but not required.

  2. README that follows this checklist:

    • What it does, why you built it

    • How to run it (we need to be able to run your code without issues)

    • Hardware you used (e.g., M2 Pro 16GB)

    • Model variant (yolo26n, yolo26s, yolo26m, yolo26l, yolo26x)

  3. 60-second demo video: screen record or phone-record your demo working. No narration needed, no production polish required. That said, get creative if you want to. A great demo video makes your build memorable.

  4. Social post on X or LinkedIn: share your demo video on at least one platform, tagged with #YOLOMLX and tagging webAI. Cross-posting on both is encouraged and helps your build get seen by more people.

  5. Confirmation you’ve completed the registration and acceptance form so we know your submission counts.

Rules

  • One submission per individual or team. Pick your best build, ship it.

  • Pre-existing code is okay as scaffolding. Your YOLO26 MLX integration and the core build must happen during the challenge window (May 18-24).

  • AI assistants are encouraged. Cursor, Copilot, Claude, whatever helps you ship. We’re an AI company. Use AI to build.

  • Open source code only. Don’t include proprietary code from your day job or anywhere else you don’t have rights to publish.

Terms & Conditions

By submitting to this challenge, you agree to the Terms & Conditions. This covers IP ownership, prize eligibility, and a few other legal basics.

To enter, fill out the registration and acceptance form. Required to be eligible for prizes. You can do this anytime before the submission deadline.

Getting started

Before you start building, read our Getting Started with YOLO26 MLX guide. It covers setup, requirements, a working hello-world script, and common gotchas.

How you’ll be judged

  • Use of YOLO26 MLX / On-device execution (20 pts): Is YOLO26 MLX meaningfully used? Does the demo run locally on Apple Silicon? Is on-device inference central, not decorative?

  • Demo quality / Shipping completeness (20 pts): Does it actually work live? Is the user flow clear? Is it stable enough to understand the idea without hand-waving?

  • Impact / Usefulness (20 pts): Does it solve a real problem or create a clearly valuable experience? Is the target user obvious?

  • Technical execution (15 pts): Is the implementation thoughtful? Good latency, model integration, camera/input handling, architecture, edge-case handling?

  • Creativity / Originality (15 pts): Is the idea fresh, clever, surprising, or differentiated from obvious object-detection demos?

  • Presentation / Storytelling (10 pts): Did the team explain the problem, solution, demo, and why it matters clearly?

Total: 100 points

Judges:

  • Mitch DePree, ML Engineer, webAI

  • Fatih Altay, ML Engineer, webAI

  • Hossein Moghimifam, VP of AI, webAI

  • Jay Peredo, Community Lead, webAI

  • Sam Avila, ML Engineer, webAI

Prizes

  • Track winners (4): $1000 per team + featured by webAI across blog and socials. A highlight post on our blog, a thread on X, a LinkedIn post, and amplification across our community channels. Cash prize is awarded to the team and split at the team’s discretion.

  • Every submission: included in our recap blog post and amplified across webAI socials.

Timeline

  • Mon May 18, 6:00 - 8:00 PM CT: Kickoff #1 at HackAI (Capital Factory)

  • Tue May 19, 5:30 - 8:00 PM CT: Kickoff #2 at AITX tech talk (Antler)

  • Thu May 21: Mid-challenge check-in. Our AI/ML team will be actively answering questions in the Q&A topic for this challenge

  • Sun May 24, 11:59pm PT: Submissions due

  • Wed May 27: Winners announced

How to join

To enter the challenge:

  1. Sign up for a community.webai.com account if you haven’t already (free, takes 30 seconds)

  2. Register by filling out the registration and acceptance form (required to be eligible for prizes)

  3. Build it using YOLO26 MLX, following the requirements above

  4. Submit by replying directly to this topic by Sun May 24, 11:59pm PT. Include your GitHub repo link, demo video, and social post link in your reply

Submissions posted anywhere else won’t count, so make sure your reply lands here.

The Getting Started guide covers setup and a working hello-world. The Q&A topic for this challenge is where to ask questions. The “Who’s in?” topic is where to tell us you’re participating and find teammates.

2 Likes

The starter template hyperlink has been updated, but you can also find it here: yolo-mlx/GUIDE_TRAINING_BENCHMARK.md at main · thewebAI/yolo-mlx · GitHub

1 Like

Thanks for updating the Starter Template - I couldn’t find it yesterday! Thanks again!

1 Like

Yolo Game Part 1
Repo

60 second video

1 Like

Yolo Game part 2

Social post

More info

1 Like

Project: SENTINEL — on-device visual triage AI
Track: Enterprise
Team: nomad-link-id + Lexi Armstrong

SENTINEL classifies every person in frame as T1 — immediate (lying down), T2 — delayed (sitting), or T3 — ambulatory (standing) using YOLO26 (yolo26n) + MLX. Built for mass-casualty scenarios where network connectivity is unavailable, compromised, or operationally forbidden: tactical edge, disaster zones, austere medical settings, secured facilities.

The entire pipeline runs on Apple Silicon. The demo video was recorded with WiFi disabled — zero network egress isn’t a marketing claim, it’s the architecture.

Repo: GitHub - nomad-link-id/sentinel-mlx: On-device visual triage AI · YOLO26 + Apple MLX · Zero network egress · webAI Build Challenge May 2026 · GitHub
Demo (55s): https://youtu.be/c2v5Mdg5fpw

Hardware: MacBook Pro 14-inch (Nov 2024), Apple M4, 32 GB RAM
Model variant: yolo26n
Performance: ~16 FPS at 720p, ~45 ms per-frame inference, 0 bytes egress

Single-file Python, single-thread synchronous loop. yolo-mlx 0.3.1, mlx 0.30.6, OpenCV 4.13. Forked the official starter to focus the 7 days on the triage classification logic, the SENTINEL dashboard UI overlay, and end-to-end stability. Architecture rationale in docs/ARCHITECTURE_DECISIONS.md.

Disclaimer: Research demonstration, not a medical device.

Thanks to Mitch, Fatih, Hossein, Jay, and Sam for the YOLO26-MLX release and the build challenge.

-– nomad-link-id & Lexi

2 Likes

Project: Meta YOLO buyer
Track: Austin & Enterprise
Team: Jordaaan

Repo: https://github.com/Organized-AI/YOLO-MLX-Hack
Video and project brief: Meta YOLO Buyer — YOLO26 MLX Media Buying Harness

  • Built a Cloudflare Workers backend to coordinate the full loop: analyze creative → propose remix → test variant → learn from results.

  • Used Durable Objects to manage campaign-level state and prevent conflicting updates across campaign, ad set, ad, and creative workflows.

  • Stored YOLO/MLX computer vision outputs as structured inspection signals, so the backend knows what changed: headline load, product cue, subject crop, hook, and CTA.

  • Designed the system around human-gated creative changes, with proposed updates blocked from publishing until reviewed.

  • Added a 98% confidence threshold before any future automated action is allowed, keeping automation controlled and evidence-based.

  • Built to support Agent orchestration (i.e. Hermes can inspect past results and learn how to use this from top to bottom)

1 Like

Project: ScreenSense — Framework for training of Real-Time UI Element Detection
Track: Useful / Enterprise
Team: okigan

ScreenSense - establishes framework for training UI elements detection and demostrates on 17 types of UI elements (buttons, checkboxes, text inputs, dropdowns, sliders, toggles, cards, dialogs, etc.) directly from screen pixels using YOLO26-Medium + MLX. Built for computer use agents, UI test automation, accessibility auditing, and RPA — anywhere you need to find interactive controls without DOM access.

The framework constructed as a pipeline: synthetic data generation, training from scratch (not fine-tune), and live inference — runs on a single MacBook. No cloud, no manual labeling, no platform-specific APIs.

Repo: GitHub - okigan/yolo-screensense · GitHub
Demo: https://youtu.be/5GBi5RSLAc0 (short version) https://youtu.be/rRlg9WzKS3s (longer version)

Hardware: MacBook Pro, Apple M1 Max, 64 GB RAM
Model variant: yolo26m
Performance: ~3.5 FPS at 640px, ~288 ms per-frame inference, 0 network calls

Key results: mAP50 0.903, Precision 0.834, Recall 0.947 across 17 GUI element classes. Trained from scratch in ~7.5 hours on 2,000 fully synthetic images generated via Playwright in 10 minutes. Zero manual annotation — bounding boxes extracted from DOM after headless render.

Live demo supports interactive region selection and adjustable confidence threshold on any application window.

Whole framework / pipeline can be retargeted to specific UI framework(s) by adjusting training data generation, auto labeling procedure and augmentation procedure.

Thanks to the webAI team for YOLO26-MLX and the build challenge.

#YOLOMLX

1 Like

Name: AccessLens — Real-Time Scene Narrator for the Visually Impaired

Track: Useful

Team: Prince

AccessLens narrates a physical environment in real-time using YOLO26 (yolo26n) + MLX. Point a MacBook camera at a room and it speaks what’s there — object identity, spatial position, proximity — with priority-scored cadence that prevents information overload. Built for blind and low-vision users who need spatial awareness without surrendering visual privacy to a cloud service.

The entire pipeline runs on Apple Silicon. Camera frames never leave the device — not to a server, not to an API, not to a log. Zero network egress is the architecture, not a configuration option.

Repo: GitHub - nanaagyei/accesslens · GitHub
Demo (52s): https://youtu.be/YFThk8kSvag

Hardware: MacBook Pro, Apple SiliconModel variant: yolo26nPerformance: ~10 FPS capture, ~85ms p50 inference, <130ms detection-to-speech, 0 bytes egress

Pipeline: Camera → WebSocket (localhost) → YOLO26 MLX → spatial zone classification → priority narrator → Web Speech API. Single-machine, two-process architecture: Python/FastAPI backend for inference, Next.js frontend for capture + narration logic.

Interaction model: Space (describe scene), F (find/search for object by class), M (mute), B (blind mode — screen off, narration continues), ? (shortcuts overlay). Voice search highlights matching bounding boxes and speaks location.

Architecture decisions: Narration logic is a pure function (no DOM, no speech calls) — fully unit tested. Tracker uses frame-to-frame IoU matching with label locking to prevent flickering. WebSocket uses backpressure (drops frames if inference in-flight) rather than queuing. Confidence-scaled bounding box opacity gives visual feedback on detection certainty.

2 Likes

Hey this is team provenance.guru

github repo:

linkedin post (includes video):

Thank you team webAi this was fun and will be continuing to work on our new product at provenance.guru :smiley:

2 Likes

Sovereign Vision - Enterprise track submission

Team: Karthik Barma (solo)

Project tagline: The first on-device enterprise vision system whose privacy is enforced by code, not by policy. Seven cryptographically-audited constitutional rules redact every person bounding box, hash every face region, drop every track ID, and add calibrated differential-privacy noise to every aggregate, before any output exists.


Submission deliverables

What it does

Intercepts every YOLO26 MLX inference in flight and enforces a 7-rule constitutional firewall. Detections that leave the pipeline are aggregate-only, PII-redacted, and cryptographically attested. Each frame issues a self-attested compliance certificate. Certificates chain into a Merkle tree. Session roots can be anchored to a DigiCert RFC 3161 trusted timestamp so a regulator three years from now can verify exactly when the session happened.

Why I built it

Most enterprise CV deployments die in legal review. GDPR Article 4 makes a person’s spatial location personal data. Article 9 covers face data. Recital 30 covers track IDs. Every off-the-shelf CV system produces PII the instant it generates a bounding box, which is why most factories, stores, and hospitals still don’t deploy the cameras they already own. Sovereign Vision rebuilds the stack so PII is impossible at the type level.

How to run

pip install sovereign-vision
sovereign demo
1 Like

Name: Daedalus-RT, Undetectably Flood your Hackathon Competitors with False Positive Detections!

Track: Weird/wacky

Team: Nikhil Kalidasu (solo)

Wouldn’t it be funny to mess with literally every other submission to the YOLO26 MLX Build Challenge? Introducing Daedalus-RT, a real-time, on-device adversarial attack that floods YOLO26-n with high-confidence false-positive detection boxes. With effectively zero added latency, compositing a single adversarial filter onto every video frame can completely destroy the model’s ability to detect real objects. Detailed attack breakdown is in the repo’s README.

Repo: GitHub - nik875/Daedalus-RT: Real-time version of 'Daedalus: Breaking Non-Maximum Suppression in Object Detection via Adversarial Examples' via universal adversarial perturbation, tuned for YOLO26 · GitHub
Demo: https://youtube.com/shorts/gmYOZACKt3Q?feature=share
LinkedIn Post: #yolomlx | Nikhil Kalidasu

Inference hardware: Apple M1 Pro, 16GB RAM
Model variant: YOLO26-n
Latency: 50+ FPS baseline, undetectable latency change under attack.

Why did I build this?

Because it’s fun! But also because I want a job :backhand_index_pointing_right: :backhand_index_pointing_left:

I would love to work with webAI on developing similar low-latency on-device AI, and hope this 1-day sprint project showcases my technical depth. I applied last week for the AI Research Scientist role. I would greatly appreciate an interview, I pinky promise I know what I’m talking about :folded_hands: !

1 Like

Meet Don’t Wake Up - the alarm clock that does not trust you.

Track: Wild

Team: Quamos (Saksham Adhikari and Kusum Bhattarai Sharma)

Why settle for a normal alarm clock when you can get an alarm clock which rage baits you out of bed?
That is what we built.

Don’t Wake Up is an MLX-powered computer vision alarm clock that refuses to shut up until you prove you are actually awake. Not by tapping a button. Not by shaking your phone. Not by mumbling at Siri. You have to get out of bed, stand in front of the camera, do the 6-7 movement ( Reddit - Please wait for verification ), and then show a real coffee mug to turn it off.

If you try to cheat by showing a picture of a mug on your phone, it calls you out: “na na buddy, that won’t work.”

The alarm only stops when the full wake-up ritual is completed.
Here is the demo video :

GitHub Repo: GitHub - Tar-ive/6_7: A YOLO alarm system · GitHub

X post: https://x.com/saksham_adh/status/2058712860935512244

The hardest part of this project was that YOLO26 pose support was not fully available in the MLX YOLO stack yet. So we patched it and opened a PR so we could complete the project. Add YOLO26 pose inference support by Tar-ive · Pull Request #6 · thewebAI/yolo-mlx · GitHub

Thanks to the WebAI team for the Challenge, we really enjoyed building with the MLX-webai stack.

1 Like

Category: Useful

I built **Vault** for webAI’s YOLO26-MLX build challenge — a privacy-first home inventory app that uses YOLO26 fine-tuned on household objects to catalog items for insurance documentation, entirely on-device on iPhone.

The technical bet I’m most proud of: **RoomPlan LiDAR and YOLO26 object detection run simultaneously on a single ARSession.** One camera feed powers both. RoomPlan builds the 3D mesh; YOLO inference fires at 5 fps on the same `arSession.currentFrame` stream — without stealing the ARSessionDelegate, so RoomPlan’s pipeline stays intact. Two real-time ML pipelines, one frame source, both running on the iPhone’s Apple GPU.

The training pipeline:

-> Stock yolo26-s ported by hand to MLX-Swift for iOS (~38 MB, MAE=0 parity with PyTorch)

-> Fine-tuned yolo26-m on a merged dataset of HomeObjects-3K + Roboflow household (3,622 images, 32 classes covering furniture and small appliances)

-> Trained for 50 epochs in webAI’s `yolo-mlx` on an M1 Max, batch 8, ~7 hours. **Zero PyTorch dependency at runtime.**

-> Best checkpoint at epoch 45: **mAP50 = 0.744, mAP50-95 = 0.57**

-> The resulting 84 MB `.safetensors` drops directly into the iOS app bundle

End-to-end MLX. Training stays on the user’s Mac. Inference stays on the user’s phone. Photos never leave the device — only the structured catalog (names, counts, user-typed values) goes to an LLM for the priced report.

This is the on-device AI story that webAI’s MLX stack actually enables: a complete, vertical app where the model can be customized by the developer locally and shipped to users without ever touching a cloud GPU.

Deck: :framed_picture: https://slideshow-kohl.vercel.app includes the 1 min video

Source: :link: GitHub - PromptForcePrime/vault-ext: Vault is iPhone all that allows users to privately capture and catalog their home assets for insurance purposes without seeing photos over to any cloud services. · GitHub

-– @W2OPQR

1 Like

Demo: :movie_camera: https://youtube.com/shorts/9kwbLM0gVxg

1 Like

CareSight is a local-first caregiver awareness prototype that watches for possible care events, such as a floor stay or a missing loved one, stores the event locally, and turns it into a human-readable review story.

Track: Enterprise

Why Enterprise: the demo runs in a home-style setup, but the product path is caregiver operations: care teams, assisted-living workflows, local audit trails, and human-reviewed handoff queues. The current build is immediately useful for family caregiver awareness, while the same bounded loop can scale toward enterprise care operations with the right deployment, compliance, and workflow layers.

Each review answers:

  • What was likely observed
  • Where it happened
  • What evidence exists
  • What follow-up may be needed

The bounded agentic stack:

  • YOLO26 MLX for local perception
  • SQLite as the local black-box event record
  • Gemma-family MLX model for caregiver-facing drafts
  • Hermes Agent for staged, no-send handoff workflows
  • Holler 0.6B 6-bit TTS with Dakota voice for approved local readbacks
  • OBS / FaceTime as optional, human-approved handoff surfaces

The key constraint: agents assist the loop, but humans remain the authority. No medical-device claim, no autonomous emergency dispatch, and raw camera context stays local by default.

Social Media Posts - All have Video:

Github

  • Main Repo: Here
  • Hackathon Docs: Here
  • Side Project During Hackathon: Here
    • Swift implementation for YOLO26 MLX for Mobile apps / usage
1 Like

Hi everyone! Here is my submission for LiftLens.

GitHub repo: Here

Demo video / social post:

Track: Useful

LiftLens is an iOS fitness app prototype built to help beginners and returning gym users feel more confident starting a workout. The core idea is simple: do the workout, and LiftLens logs it.

Instead of expecting users to already know what to do, remember every set, or manually track everything afterward, LiftLens guides them through onboarding, recommends a beginner-friendly workout, uses the camera to track a squat set, counts reps, and saves a local workout summary.

The value add is reducing the intimidation and mental load of getting started at the gym. LiftLens focuses on confidence, clarity, and repeatability: helping users understand what to do, complete a set, and walk away with a record of their workout.

YOLO26n MLX is used as the on-device person-tracking layer, with Swift integration through Vel-Labs/yolo26-mlx-swift, while the app layers workout flow, rep tracking, and local summaries on top.

1 Like

Hello fellow button-pressers!

Reality Firewall — a local physical-world policy engine for Apple Silicon. A webcam streams into YOLO26 MLX (`yolo26n.npz`, MLX/Metal), and a Python policy engine evaluates each frame against JSON rule packs, emitting `ALLOW` / `WARN` / `DENY` / `APPROVAL_REQUIRED` / `UNLOCKED` decisions with SHA-256 chained audit receipts.

Stack

  • Vision: YOLO26 MLX via `from yolo26mlx import YOLO`, `model.predict(frame, conf=0.25)`; ~real-time on M-series.
  • Hands: MediaPipe HandLandmarker (index fingertip, EMA-smoothed) for point-to-unlock.
  • Backend: FastAPI + WebSocket vision stream, plus an HTTP `/api/select-detection` endpoint so unlock confirms are not blocked by inference.
  • Policy engine: Rule packs (`home.json`, `enterprise.json`) with conditions like `liquid_near_laptop`, `overlap_zone`, `inventory_mismatch`, `ppe_style_proxy`, `after_hours_person`, with stable/clear hysteresis to prevent event spam.
  • Audit: Canonical JSON SHA-256 receipts chained per event; JSONL export.
  • Frontend: React + Vite UI with detection overlays, decision card, event timeline, and a linked receipt block view.

Reality Passphrase. Point at cup → tv/monitor → dog with your index finger; ~600ms dwell per target unlocks Enterprise Mode and swaps the active policy pack.

Repo: GitHub - sariknanaki/yolo26-hackathon · GitHub

Demo: https://www.youtube.com/watch?v=WY9C_8AhEko

Social Post: #yolomlx | Kristian Magda

1 Like
  1. GitHub - CupOfGeo/yolo-mlx-build-geo: My submission to yolo-mlx build challenge · GitHub

  2. README that follows this checklist:

    • POC for a passive cataloger of clothing and wardrobe

    • uv run the scripts in the readme

    • Amcrest 4MP ProHD and a Macbook pro M1 2020

    • yolo26n

  3. https://www.youtube.com/watch?v=rp6ZqeuvXZQ walk through

  4. My LinkedIn post linkedin .com /posts/george-mazzeo_yolomlx-submission-share-7464535730841755648-kkuz (sorry getting hit with too many links for new account remove the two spaces infront and after the .com)

1 Like