Industry's first multi-threaded multiprocessor IP core for embedded applications # **Baseline Specifications\*** **Product** MIPS32® 1004K<sup>™</sup> core Process TSMC 40G **Frequency**<sup>1, 2</sup> 2 GHz (typical) > 1.3 GHz (worst case) Performance<sup>3</sup> (DMIPS/MHz) 1.6 Coremark 2.9 (MHz) **Power** 0.17mW/MHz per core (core+L1 caches) Total Area<sup>1</sup> 2.3 mm2 - <sup>1</sup> Configuration: 2 core implementation, each core with 2 VPEs (h/w threads) and 32K/32K I/D L1 caches, 32 dual entry JTLB, plus Coherence Manager (CM), I/O Coherence Unit (IOCU), Global Interrupt Controller (GIC), and Cluster Power Controller (CPC) - <sup>2</sup> Optimized for speed (area and power-optimized specs available upon request) - <sup>3</sup> DMIPS score is running on one thread only Achieved using off-the-shelf standard cells from TSMC and memories from Dolphin; quoted speeds include signal integrity analysis, 10% OCV and 25ps PLL jitter margin, worst case slow corner conditions (unless specified), 0.9V nominal Vt. **Note:** Frequency, power consumption and size depend upon configuration options, synthesis, silicon vendor, process, and cell libraries. # **Key Applications** #### **Digital Home:** - Enhanced set-top boxes (STBs) - HD digital consumer multimedia - Residential gateways (RGWs) **Enterprise Communications Infrastructure** Network Attached Storage (NAS) Office Automation/Multi-Function Products (MFPs) • Medium/large office print/fax/scan # MIPS32<sup>®</sup> 1004K<sup>™</sup> The MIPS32® 1004K™ Coherent Processing System (CPS) is the next advance in licensable processing technology from MIPS Technologies. The 1004K CPS is a highly scalable multiprocessor platform that supports up to four cores connected via a coherent system architecture. With the inclusion of hardware multi-threading in each core, the 1004K CPS is optimized to maximize performance in System-on-Chip (SoC) implementations and overcome historical performance limitations in embedded systems due to memory constraints and access latencies. The 1004K CPS provides leading performance power efficiency, delivering more than 15,000 Coremarks at over 1.3 GHz in less than 1 watt dynamic power for 4 cores in 40nm process technology. ## MIPS32 1004K CPS Highlights - A coherent multiprocessor system using multi-threading to extend performance beyond traditional multiprocessor solutions - Up to four multi-threaded CPU cores, with two hardware threads/core - Multi-threading complements multi-core leverages SMP operating systems and programming models, with minimal silicon cost added - Hardware I/O coherency offloads CPU software I/O coherency overhead - Configuration and scalability at core and system levels, addressing a broad range of price/performance implementation points for optimal product implementations - Top end performance/power efficiency for licensable IP, delivered through: - Coherent multi-core implementation for highly scalable performance - Multi-threading for higher utilization of the pipeline in each core with minimal power/area increase - MIPS® architecture performance delivered in efficient 9-stage pipeline #### **Features** ## A complete system for coherent multiprocessing, including: - 1 to 4 1004K multi-threaded "base" cores (up to 8 hardware threads) - Coherence Management (CM) unit the system "glue" for managing coherent operation between cores and I/O - I/O Coherence Unit (IOCU) hardware block for offloading I/O coherence from software implementation on CPUs - Cluster Power Controller (CPC) multi-core power management - Global Interrupt Controller (GIC) system and inter-processor interrupt controller - Extended 256-bit interface to L2 cache controller, and 256-bit interface from L2 Cache controller to rest of system (available separately) - EJTAG/PDtrace<sup>™</sup> block for advanced debug/trace of complete coherent system #### 1004K Base Core - 9-stage pipeline delivering more than 2.9 Coremark/MHz per core - Based on 34K<sup>™</sup> series cores, with cache coherency support added - Supports single- or dual-threaded operation per core - Uses Virtual Processing Elements (VPEs) for hardware multi-threading - Integer (1004Kc<sup>™</sup>) and floating point (1004Kf<sup>™</sup>) versions - Support for Revision 1 of MIPS32 DSP ASE - Coherency port has duplicate data cache tags for background coherency checks - Design-time configurability for inclusion and sizing of instruction and data TLBs, caches, scratchpad RAM and other options #### Floating Point Unit (FPU) - 1004Kf core has IEEE 754-compliant FPU, compliant to MIPS® 64-bit FPU architecture - Supports single- and double-precision data types - Separate in-order, dual-issue pipeline decoupled from integer pipeline ## **Coherency Management (CM) Unit** - Manages coherency using the MESI protocol - Operates at same clock (1:1) as CPUs for maximum performance - 256-bit extended interface for maximum throughput to (optional) L2 cache controller - Supports performance enhancements via L1 cache-to-cache transfers, speculative reads to external memory, and globalized cache operations - Global Configuration Registers (GCRs) for configuring/controlling CM scheme #### **Cluster Power Controller** - Provides highly scalable performance/power management via shutdown and bring up of one or more cores in the coherent processing system - Works in conjunction with each core implemented in a separate power domain ### I/O Coherence Unit (IOCU) - Bridges non-coherent I/O peripheral transfer and makes transactions coherent - Supports per-transaction attributes for snooping L1 caches, L1+L2 caches, or non- coherent transactions, plus I/O prioritization #### Global Interrupt Controller (GIC) - Supports system-level interrupts; inter-processor interrupts - Routes interrupts to particular core or VPE - Configurable # of system interrupts (up to 256) #### **Development Tools** - MIPS® Navigator ICS IDE, software toolkit, MIPSsim™, EJTAG and PDtrace probes - CodeSourcery SG++ toolchains for MIPS Uses multi-threading to deliver maximum performance from each core