What is Trace ?
The application generates data when an event occurrs, e.g. RTOS context switch, or at periodic intervals. This data is then transmitted and reconstructed externally to generate a timeline of system activity. Examples of instrumented trace include Segger SystemView and Tracealyzer from Percepio.
|Low bandwidth||Uses CPU resources|
|No hardware required||Will affect application timing|
|Timing accuracy may be affected by system events|
The target processor contains internal logic to generate a data stream that represents executed instructions. The Embedded Trace Macrocell (ETM), an optional part of the ARM Cortex M3/M4 architecture, is an example of this and the one which the QTrace probe is designed for. Some trace schemes can also include user specified variables or CPU registers in the data stream though it is not implemented in the M3/M4 ETM. Trace data is transmitted on multi-purpose processor GPIO pins to be decoded externally.
|No CPU cycles required||Requires an external hardware decoder|
|Does not affect system timing||High bandwidth, generates many MB/s of trace data|
|Highest resolution possible||Requires additional processor pins and a larger debug connector|
|Real-time tracing||Routing of trace signals is critical|
|Expensive (QTrace is an exception)|
Non-instrumented trace, generally referred to as ‘trace’, has two forms; buffered and continuously streamed.
A large external RAM buffer is filled with trace data after a previously configured event occurs. The trace data must be uploaded to a PC, decoded and then analysed to pick out the event of interest.
Trace data is continually streamed to an analyser application which decodes and presents the real-time data as live disassembly and source level views. The views can be paused for analysis whilst trace is still being streamed and decoded in the background at full speed. Not being limited by a hardware buffer means streamed trace also allows full code coverage and profiling of an application.
QTrace is implemented as a continuously streamed real-time trace system.
The Embedded Trace Macrocell (ETM) is an optional part of the ARM Cortex M3/M4 architecture so not all processor manufacturers include it although many do. The ETM block is part of a larger ARM debugging ecosystem called CoreSight. This encompasses JTAG/SWD debugging, an optional Embedded Trace Buffer (ETB) and the Instrumentation Trace Macrocell (ITM) for low speed tracing using the single SWO debug pin. An overview of these features is given in a PDF presentation on the ARM website here.
ETM isn’t usually an option for devices with low pin counts e.g. 48 pins or less, even though the core may implement it. However, there is usually a higher pin count version of the processor is available that does provide access to the ETM signals.
Convince your hardware department to design-in a part with ETM (plus a 20 way trace connector), at least on the rev.A board. It really will pay dividends !
A trace data stream is generated by the ETM block using five GPIO lines which are programmed to operate as trace output pins (clock + 4 x data). The trace clock typically runs at half the system clock speed and the 4-bit trace data is clocked on both clock edges i.e. double data rate (DDR). Some manufacturers offer the ability to reduce the trace clock speed but this can lead to internal trace buffer overflows.
The trace data is arranged into 16 byte frames which are clocked into, and decoded by, an external hardware decoder such as the QTrace probe. The bytes contain compressed information such as instruction type (branch or not), branch address, exception information, etc. These are streamed to a PC application which translates them into the corresponding disassembly and high level source for display.
The streamed trace data depends on the instructions being executed. Branch instructions with target addresses unknown at compile time and interrupt entry / exit typically generate the most trace data. A processor running at 200MHz can generate a trace data rate of 10’s MB/s.
Preconceptions of Trace
A common perception of trace is that it’s a heavyweight tool which is only used for tackling nasty problems or for code coverage certification. Trace is used in these scenarios but it offers much more.
Another general view of trace is that it’s expensive. It is true that trace tools tend to have have a large price tag and are generally only used by large organisations. However, the low cost QTrace solution makes trace affordable for developers that are on a tight budget.
What can QTrace do ?
● Identify which functions are being called most frequently and which are taking the most CPU cycles
● Calculate the rate at which an area of code is being executed without needing to toggle an I/O pin
● See which functions and condition branches have been executed without stopping the CPU
(ideal for debugging motion control, communication protocols, PID controllers, etc.)
● See which interrupts are occurring and how frequently
● Show a call stack without having to stop the CPU
● Review 10’s of milliseconds of execution (C source and disassembly) prior to a CPU exception