Shibo Chen

Senior Architect | Chiplet-based AI accelerators, NoC and D2D interconnects, and memory-system architecture

Santa Clara, CA

Summary

Senior Architect at Tenstorrent focused on chiplet-based AI/ML accelerator systems, scalable fabrics, and memory-system architecture.
Expertise in NoC and D2D architecture, non-coherent interconnects, AXI/CHI/PCIe fabrics, and LPDDR/DDR memory systems, with selective use of performance modeling to drive decisions.
Known for driving architecture decisions with clear models, tight cross-functional debug loops, and measurable focus on throughput, latency, utilization, power, and area.

AI Accelerator Architecture: chiplet-based AI/ML accelerator systems, system tradeoff analysis, performance target closure
NoC and Interconnects: non-coherent NoC, on-package D2D, routing, QoS, topology, clocking, observability, RAS/debuggability
Standards and Fabrics: AXI, CHI, PCIe, cache and fabric integration, heterogeneous SoC connectivity
Memory Systems: LPDDR/DDR controllers, memory-channel sizing, interleaving, bandwidth analysis, bottleneck removal
Modeling and Analysis: performance models, simulators, workload characterization, latency/bandwidth/utilization studies
Execution: architecture specifications, design reviews, RTL/PD/verification/software debug, cross-team technical leadership

May 2025 - Present Tenstorrent, Santa Clara, CA

Senior Architect
Architected NoC, D2D, and memory-system tradeoffs across chiplet-based AI/ML accelerator systems, including GDDR, LPDDR, IO, and RISC-V CPU chiplets.
Drove architecture studies across routing, topology, QoS, buffering, clocking, and debug visibility to close throughput, latency, utilization, power, and area targets.
Defined architecture requirements for links, queues, memory channels, and address interleaving policies, using performance analysis where it materially improved decisions.
Debugged performance and power bottlenecks with RTL, physical design, verification, and software teams, tracing workload-level issues back to fabric and memory behavior.
Led architecture reviews and specification work spanning non-coherent NoCs, on-package chiplet interconnects, AXI/CHI/PCIe fabrics, and LPDDR/DDR controller interactions.

May 2023 - August 2023 Tenstorrent, Santa Clara, CA

Platform Architecture Intern
Architected and studied multi-core, multi-chiplet fabric changes to improve performance efficiency and power tradeoffs for server-class compute subsystems.
Estimated on-chip and off-chip traffic for power analysis and early system planning.
Defined cache and fabric verification testplans for heterogeneous integration scenarios.
Built workflows for multi-core and multi-chiplet traffic generation, testing, and simulation.
Configured Arteris IP for non-coherent traffic and exercised corner cases relevant to fabric integration.

May 2022 - August 2022 Tenstorrent, Santa Clara, CA

Platform Architecture Intern
Contributed to architecture studies for a server-class RISC-V multicore CPU.
Developed an in-house configurable fabric performance model for heterogeneous multicore systems.
Defined configuration semantics to represent a broad design space across cores, caches, memory, and fabric options.

2019 - 2025 University of Michigan, Ann Arbor, MI

2022 - 2025 University of Michigan, Ann Arbor, MI

2016 - 2019 University of Michigan, Ann Arbor, MI

2025 Zipper: Latency-Tolerant Optimizations for High-Performance Buses

ASP-DAC 2025
Demonstrated up to 8x speedup for one accelerator with 4.3% logic overhead, and 1.5x speedup for another with 0.9% overhead.

2022 Twine: A Chisel Extension for Component-Level Heterogeneous Design

DATE 2022
Introduced a hardware-design methodology for composing heterogeneous systems at the component level.

2023 Security Verification of Low-Trust Architectures

Last updated: March 2026