YOLOX-S — Core AI
YOLOX (Megvii, Apache-2.0) converted to
Apple Core AI (.aimodel) — a single-stage anchor-free object detector running as
one static graph on every Apple compute unit (Mac GPU / iPhone GPU / Neural Engine). Part
of the Core AI model zoo
(model card).
The dense-detector counterpart to
RF-DETR-CoreAI: where the DETR family
needs no NMS, YOLOX is the classic score = obj · cls + per-class NMS pipeline.
Bundle
yolox-s_float32.aimodel— YOLOX-S, 640² input, 8.97M params, fp32 (the ship dtype; detection has no bandwidth-bound decode loop, so fp16 is no faster on the GPU and only adds near-tie noise). 36 MB. Same bundle on macOS and iOS.
Graph contract
input "image" [1,3,640,640] f32 BGR, 0-255, letterboxed (pad 114, top-left) — YOLOX-native (no /255, no mean/std)
output "preds" [1,8400,85] f32 [cx,cy,w,h, obj, cls_0..cls_79]; box DECODED to 640-px, obj/cls SIGMOID-ed (in-graph)
Host post-process: score = obj · max_class, threshold, per-class NMS (IoU 0.45), then
un-letterbox the survivors. Anchors A = 80² + 40² + 20² = 8400 (strides 8/16/32).
Parity & speed (measured)
- vs torch fp32: head cosine 1.000000, end-to-end detections IoU 1.000 on CPU and GPU.
- M4 Max GPU: 4.80 ms / 208 FPS (median). M4 Max CPU 57 ms.
- iPhone 17 Pro (Release, GPU, live camera): ~22 ms / 35–40 FPS end-to-end; first-load on-device specialization ~2.6 s (no AOT). The on-device gate reproduces the Mac fp32 oracle 6/6 (cat 0.96/0.96, remote 0.86/0.86, bed 0.71, couch 0.54).
Use (CoreAIKit)
import CoreAIKitVision
let detector = try await YOLOXDetector(model: .yoloxS) // downloads this repo
let detections = try await detector.detect(in: pixelBuffer, scoreThreshold: 0.3)
Live-camera + video reference app: DetectCamera in coreai-kit.
Convert it yourself
conversion/export_yolox.py
— --variant s --yolox-repo <YOLOX checkout> --weights yolox_s.pth, gated end-to-end with
--verify-image <img> --unit {cpu,gpu}. The script also maps nano/tiny/m/l/x.
License
Apache-2.0 — upstream YOLOX code and COCO-pretrained weights are Apache-2.0.