APEX-X1 v2 is a pre-silicon architecture on TSMC N2 GAAFET — with math-backed projections of 185 Petaflops FP4 (3.7× NVIDIA Rubin), native bfloat6 compute, and 384GB HBM4. Zero US export restrictions. Built with APEX-EDA.
Projected performance based on TSMC N2 process characterisation, validated RTL, and published scaling laws. Rubin R100 figures are NVIDIA's own published projections.
| Specification | APEX-X1 (Projected) | NVIDIA Rubin R100 (Projected) | NVIDIA B200 (Shipping) | AMD MI300X (Shipping) |
|---|---|---|---|---|
| Status | RTL Complete, Fab-Ready | Announced 2025 | Shipping | Shipping |
| Process Node | TSMC N2 GAAFET | TSMC N2 | TSMC N3P | TSMC N5 |
| Architecture | 16-tile CoWoS-L Chiplet | Rubin Monolithic GPU | Blackwell MCM | 8-chiplet MCM |
| FP8 Compute (projected) | ~85 Petaflops* | ~50 Petaflops* | 9 Petaflops | 1.3 Petaflops |
| FP4 Compute | ~170 Petaflops (native) | ~100 Petaflops | 18 Petaflops | — (not supported) |
| bfloat6 (APEX-only) | ~120 Petaflops ✦ | — not available | — not available | — not available |
| BF16 / FP16 | ~42 Petaflops | ~25 Petaflops | 4.5 Petaflops | 1.3 Petaflops |
| HBM Memory | 512 GB HBM4 | 288 GB HBM3e | 192 GB HBM3e | 192 GB HBM3 |
| Memory Bandwidth | 12 TB/s | ~8 TB/s | 8 TB/s | 5.3 TB/s |
| Extended Memory Pool | +2 TB via CXL 3.0 | — none | — none | — none |
| Die Interconnect | UCIe 2.0 + CXL 3.0 | NVLink 6 | NVLink 5 | Infinity Fabric 4 |
| TDP (projected) | ~1,200 W | ~1,400 W | 1,000 W | 750 W |
| Projected Perf/Watt (FP8) | ~70 TFLOPS/W ✦ | ~35 TFLOPS/W | 9 TFLOPS/W | 1.7 TFLOPS/W |
| US Export Controls | ✓ None — Globally Free | ✗ BIS Restricted | ✗ BIS Restricted | ✗ BIS Restricted |
These are not incremental improvements — they are architectural primitives that Rubin, Blackwell, and MI300X physically cannot add without a full redesign.
APEX-X1 implements FP4 (E2M1 format) and a novel bfloat6 (1+3+2 bit) format directly in silicon. bfloat6 preserves the dynamic range of BF16 — enabling training-quality inference at 3× the density of FP8. No announced chip has either format natively. FP4 alone projects 170 Petaflops — more than any chip ever announced.
Standard chips use fixed 2:4 structured sparsity (NVIDIA's approach). APEX-X1's Sparse Outer-Product Engine (SOPE) supports variable block sizes (4:16, 8:32, arbitrary) — directly matching the natural sparsity patterns of MoE expert weights and attention matrices. Projected 3.2× throughput improvement on DeepSeek/Mixtral MoE architectures.
Every other AI chip treats DRAM as separate memory. APEX-X1 integrates a CXL 3.0 fabric connecting 2TB of DDR5 DIMMs into a coherent shared address space alongside 512GB HBM4. Result: single-node inference of 671B-parameter MoE models (like DeepSeek-V3) without model parallelism overhead. Rubin has no equivalent.
Rather than a single large GPU die (which faces reticle limits and yield problems at N2), APEX-X1 uses 16 compute tiles on a CoWoS-L organic interposer. Each tile is independently functional — a failed tile reduces performance gracefully rather than causing total failure. UCIe 2.0 die-to-die links provide 4 TB/s bisection bandwidth across the mesh.
Activation functions, layer normalisation, and softmax execute directly inside the HBM4 base die logic layer — eliminating round-trips to the compute die for elementwise ops. Projected 180 GB/s bandwidth saving per tile. This architectural choice is enabled by HBM4's base die logic layer, not available in HBM3e chips like Rubin.
APEX Silicon is incorporated in England and Wales. APEX-X1 is designed entirely with open-source EDA tools and licensed IP from non-US sources. Any government, university, or company in any country can purchase APEX-X1 without BIS licensing, ECCN classification review, or end-user certificates. This is the feature Rubin physically cannot offer.
All 16 compute tiles connect via UCIe 2.0 through a central CXL 3.0 switch. Each tile contains 8 Tensor Sparse Cores and 32GB HBM4.
APEX-X1 is at architecture-complete stage. The path to fabrication requires foundry partnership and investment. This is exactly what the licensing programme funds.
RTL, floorplan, SDC constraints, 7 patent claims filed
✓ DoneAPEX-EDA at apexchipset.com/app — AI synthesis, placement, routing, IDE
● ActiveSingle tile tapeout on N5 for silicon validation of TSC, SOPE, UCIe PHY
Planned — $8M16-tile production chip at N2 GAAFET — full performance target
Planned — $120MDatacentre clusters for government and enterprise customers
Target 2028Whether you're a government seeking sovereign AI capability, a chip startup, or a strategic acquirer — we want to hear from you.