Shared Library¶
The v002 RTL isolates common numeric operations and data structures under the
Library/ directory. In the compile ordering recorded by filelist.f, these
files appear immediately after the package tier (A–D) and before isa_pkg.
Compute cores depend on this shared library to prevent duplicate implementations
of the same operation.
Algorithms Package¶
algorithms_pkg (Library/Algorithms/Algorithms.sv)
Defines the QUEUE status struct queue_stat_t, a two-field packed struct
containing empty and full. External logic that needs to inspect queue state
uses this type rather than reading the raw signals directly. A STACK entry is
reserved as a commented stub.
bf16_math_pkg (Library/Algorithms/BF16_math.sv)
Provides BF16 arithmetic as a SystemVerilog package. The file header documents
the bit layout: [15]=sign, [14:7]=exp(8b), [6:0]=mantissa(7b). The hidden
bit (implicit leading 1) is not stored.
Exposed types and functions:
bf16_t— Packed struct with a 1-bit sign, 8-bit exponent, and 7-bit mantissa.bf16_aligned_t— Packed struct holding an 8-bitemaxand a 24-bit two’s-complement aligned value.to_bf16(raw[15:0])— Automatic function that casts a raw 16-bit value tobf16_t.align_to_emax(val, emax)— Aligns a BF16 value to a givenemaxand returns a 24-bit two’s-complement integer. Shifts the mantissa right bydiff = emax - val.expbefore sign extension.bf16_add(a[15:0], b[15:0])— Adds two packed BF16 values and returns a packed BF16 result. Aligns both operands to the larger exponent, performs a 24-bit signed addition, then renormalises by locating the leading 1. Denormal, NaN, and Inf handling are not included; the autoregressive decode path operates exclusively on normalised BF16 operands.
QUEUE Interface¶
The QUEUE primitive is split across two files: an interface (IF_queue) and a
module (QUEUE).
IF_queue (Library/Algorithms/QUEUE/IF_queue.sv)
A parameterised SystemVerilog interface with DATA_WIDTH (default 32) and
DEPTH (default 8). The interface itself takes clk and rst_n as ports.
Pointer width PTR_W = $clog2(DEPTH) is derived internally. The storage array
mem[0:DEPTH-1] and pointers wr_ptr/rd_ptr are declared inside the
interface. The empty and full flags are assigned combinationally.
Three modports:
producer— Imports thepush()task only. Drivespush_data/push_en; readsempty/full.consumer— Imports thepop()task only. Readspop_data/empty/full; drivespop_en.owner— Used by the QUEUE module itself. Receives all handshake signals as inputs; driveswr_ptr/rd_ptrand referencesmemviaref.
QUEUE (Library/Algorithms/QUEUE/QUEUE.sv)
A module with a single port IF_queue.owner q. It re-derives the pointer width
as PTR_W = $clog2($size(q.mem)) because modports cannot export parameters.
The always_ff block initialises both pointers to zero on reset, writes a word
when push_en && !full, and advances the read pointer when pop_en && !empty.
Quantizations¶
Quantize_BF16.sv (Library/Quantizations/BF16/Quantize_BF16.sv)
The file is an empty placeholder. It marks the intended location for BF16
quantization helpers that will provide a common conversion path between the
offline quantization pipeline and the RTL datapath.
Usage Patterns¶
The table reflects import statements and interface instantiations confirmed
directly in each source file.
Module (core) |
|
|
|
|
|---|---|---|---|---|
|
— |
o |
— |
— |
|
o |
— |
o |
o |
o = import or instantiation confirmed in source. — = not present in that file.
CVO_top declares import bf16_math_pkg::*; directly. Per the source comment,
the FLAG_SUB_EMAX path (the sub-emax stage of the CVO softmax) uses this
package’s BF16 arithmetic. algorithms_pkg, IF_queue, and QUEUE are
instantiated inside AXIL_CMD_IN, which buffers AXI4-Lite commands into a FIFO
and is itself instantiated by ctrl_npu_frontend. GEMM_systolic_top,
GEMV_top, and the PREPROCESS modules do not import any library package; they
use only `define headers.
Last verified against
Commit 8c09e5e @ pccxai/pccx-FPGA-NPU-LLM-kv260 (2026-04-29).