Reference Design
Active Project
d.wf @ soclabs

nanosoc re-usable MCU platform

Rationale

nanosoc has been designed to provide a simple microcontroller component appropriate to 'host' and support the development and evaluation of research components or subsystems. The design allows a seamless transition from FPGA to physical silicon implementation via a pre-verified programmable control system that allows reuse of software and diagnostic functionality to facilitate the configuration, control and diagnostic analysis of the research hardware such as a custom accelerators or signal processing.

The design is based upon the Arm reference design in the Cortex-M System Design Kit, CMSDK, allowing reuse of the AAA pre-verified IP, documentation and software but architected to support simple 'bolting-on' of memory-mapped experimental hardware with an appropriate testbench development environment.

nanosoc is designed to be used as a base reference SoC for development, implementation, verification and research evaluation, and comes with validation testbenches, but may be adapted and extended as required.

Technical overview

nanosoc is a Cortex-M0 based microcontroller with pad-ring support for silicon implementation. It has internal address space and control and diagnostic support for integrating custom subsystems or research components:

  • CPU - small Arm Cortex-M0 processor with Serial-Wire Debug integrated support
  • Boot Monitor - Synthesized ROM bootstrap for MCU
  • Code-SRAM bank (configurable size bank of memory primarily for downloaded test programs)
  • Data-SRAM bank (configurable size bank of memory primarily for test program data, stack and heap)
  • System peripherals (serial communications, General Purpose IO - GPIO, system counter timers and clocks)
  • Memory-mapped expansion space
  • Optional support for 1 or 2 Direct Memory Address controllers
  • Two banks of DMA-accessible SRAM buffer space for concurrent expansion space usage
  • ASCII Debug Protocol agent, ADP, with clock independent host interface
Block diagram of nanosoc that supports hosting research components of subsystems
Basic block diagram of 'chip' and 'pad-ring' functionality - to support hosting research experimental IP

Architecture

Interconnect fabric

The simple single AMBA AHB bus design of the Arm CMSDK reference design is upgraded to a multi-layer AHB-lite matrix to support up to 4 concurrent access to the primary memory and input/output components.

nanosoc AHB bus matrix  supporting concurrent address map access and arbitration only when multiple initiators compete for a shared segment of address space
nanosoc AHB bus matrix  supporting concurrent address map access

More details of how this bus matrix is generated using the Arm Academic Access tools is described at https://soclabs.org/project/building-system-optimised-amba-interconnect.

Design and validation testbench

The testbench (tb_nanosoc) provides the functionality to support the provision of:

  • system clocking and initialisation
  • hardware debug communications port which supports serial communications and ASCII Debug Port agent control and diagnostics
  • an Arm Serial Wire Debug controller model for validating software debugger connection and functionality
  • an Arm CPU trace model that replicates internal processor state and allows simulated instruction and data trace (both for RTL and gate-level netlist simulation verification)
Simulation testbench architecture - with nanosoc_chip_pads "socket"
nanosoc simulation testbench functionality

FPGA prototyping platform

Two example FPGA example targets for  Xilinx(R) Vivado (R) have been provided to date that support hardware prototyping and verification of the nanosoc functionality:

  • Xilinx® FPGA platform target with a wrapper layer that provides the mapping from nanosoc chip-level ports (inside the pad-ring) to the FPGA pads as well as providing the target clock and reset control from the board-specific peripherals. This supports board-level evaluation and debug at the desk, usually with a USB-connected JTAG interface.
  • Xilinx PYNQ® platform target, that supports fully networked validation support and that can be used as a shareable development resource. This uses the integrated Zynq® Arm Cortex-A processor subsystem to provide the linux OS, network stack and python environment with jupyter notebook test code. The target example is the Xilinx ZCU104 evaluation board, that has first-class PYNQ software support.

The baseline FPGA target simply requires programmable logic and memory block resources, and would normally be connected by USB cable direct to the host development system:

FPGA prototyping architecture - for baseline FPGA board target
FPGA 'wrapper' that instantiates the nanosoc_chip level of hierarchy

The Xilinx Zynq 'PYNQ' platform system development target uses the Programmable Logic (PL) resources to implement the nanosoc design and the Processing System (PS) integrated Zynq-Arm subsystem to provide to run the PYNQ software environment over an Ethernet network connection and allow browser-based SW test and verification remotely from web browser:

FPGA prototyping architecture - for PYNQ-enabled FPGA board target
FPGA 'wrapper' with PYNQ host platform support for networked development

Address Map

The address map is kept closely compatible with the Arm CMSDK reference design, to allow reuse of the documentation and the example test programs provided as a staring point. The bus matrix fabric supports additional expansion memory banks and a large uncommitted address mapped region for experimental sub-system interfacing - sufficient to configure, control and source and sink workload data to and from memory.

nanosoc address map
start-address end-address region notes
0xF0000000 0xF0003FFF System table ROM CPU/DBG config
0xA0000000 0xDFFFFFFF Expansion IO space

Experimental IO

0x90000000 0x9FFFFFFF Expansion RAM (hi) (DMA memory buffers)
0x80000000 0x8FFFFFFF Expansion RAM (lo) (DMA memory buffers)
0x60000000 0x7FFFFFFF Expansion IO space

Experimental IO

0x40000000 0x4FFFFFFF System IO (CPU MCU peripherals
0x30000000 0x3FFFFFFF Data memory (RAM) (CPU heap/stack)
0x20000000 0x2FFFFFFF Code memory (RAM) (CPU execution memory)
0x10000000 0x1FFFFFFF Bootstrap ROM synthesized, mapped to 0
0x00000000 0x0FFFFFFF Vectors, run-time code Boot ROM -> Code RAM (remapped by boot monitor)

This address map is fully visible to the CPU software environment and the ADP hardware debug agent.

The optional 1 or 2 DMA controller(s) do not have visibility

CPU Interrupts

(TBC)

Communications channel

nanosoc supports interfacing to an external testbench via an off-chip protocol (Future Devices "FT1248" serial interface). This allows both FPGA and in hardware at the board-level to a use a  standard USB host communication port (Future Devices FT232H chip or similar).

Unlike a conventional Universal Asynchronous Receiver-Transmitter, UART, this interface is chosen as it supports the serial communications clock to be sourced from the SoC, so there is no need for a known accurate baud-rate clock (and the on-chip clock source can even be a basic R-C oscillator that drifts in frequency over temperature and time). The FT1248 protocol is supported with 1, 2, 4 or 8 bit bidirectional data bus width; nanosoc implements the single bit serial protocol to minimize pin use and the interface provides full duplex hardware handshaking over the half-duplex physical channel.

The 4-pin interface is mapped onto the four lower pins of the GPIO Port-1 interface:

IO pad mapping signal name description
P1[0] FT_MISO status input from FT232H USB bridge (pin 26*)
P1[1] FT_SCLK serialiser clock output to FT232H USB bridge (pin 21*)
P1[2] FT_MIOSIO bidirectional serial data to/from FT232H USB bridge (pin 13*)
P1[3] FT_SSN seriliser select output to FT232H USB bridge (pin 25*)

(* where the FT232H interface chip is configured by serial EEPROM for FT1248 interface mode.)

This provides nanosoc with a robust handshaking serial communications channel. The channel defaults to providing standard input/output character IO, mapped to STDIN/STDOUT for the micro-contoller.

However an ASCII 'ESC' (0x1B) escape character is interpreted by the on-chip ADP (ASCII Debug Protocol) agent as the code to enter the ADP hardware monitor mode, signalled by an ASCII ']' character prompt that then allows the host console to debug and control the SoC address map directly, regardless of whether the CPU is running, and may be used to pre-initialise memory and registers and even download code images to run on the CPU. The functionality of ADP is described more fully at https://soclabs.org/project/hardware-soc-bus-level-debugger

For systems that have known-frequency stable clock generation there is also the option of using a standard two-pin UART interface:

IO pad mapping signal name description
P1[4] UART_RXD serial receive data input (from FT232H etc)
P1[5] UART_TXD serial transmit data output (to FT232H etc)

(Note: standard baud-rate programming typically results in significant simulation overheads in communicating over UART channels)

Pinlist

(TBC)

Using nanoSoC

If you'd like to use nanoSoC for your accelerator you can find all the files on the nanoSoC tech git. In order to use this in a project we suggest that you implement your accelerator as part of our Accelerator Project structure which allows for easy integration.

Reference Design Project for:

Reference Design
Example
Arm
megaSoC
A full operating system platform for complex compute and custom accelerators for larger models with power monitoring for significant research projects.

Add new comment

To post a comment on this article, please log in to your account. New users can create an account.

Project Creator
David Flynn

Consultant at University of Southampton
Research area: Low power system design

Technology

Cortex-M0 Cortex-M0

Related Articles

Submitted on

Actions

Log-in to Join the Team