Site Tools


documentation:hardware:opera:madam:matrix_engine

The ARM60's ALU was not adequate for 3D math calculations so a hardware engine with a focus on matrix multiplication was added to MADAM.

Not all capabilities were exposed through the APIs but the hardware supported a number of functions:

  • 4 x 4 matrix multiply with 16.16 initial values and a 32.32 result.
  • 3 x 3 matrix multiply with 16.16 initial values and a 16.16 result.
  • 3 x 3 matrix multiply with N/Z calculation. All initial values and results are 16.16.
  • 4 x 1 multiply with 16.16 initial values and a 32.32 result.
  • Four sets of 1 x 1 multiply with 16.16 initial values and a 32.32 result.
  • 16.16 divided by 16.16 with 16.16 result (no remainder).
  • CCoB value generator starting from X and Y deltas and H and W.
  • Short form of CCoB generation using pre-divided H and W.

The basic engine uses a stack of fourty 32bit values, a 32 by 32 signed multiplier, and a 64bit accumulator. The stack is broken up into 3 sections. The first is a 16 word section that contains the 16 values that will be used by all matrix operations. These values are not distrubed by any math operation. The other two sections are 12 words each and are alternately enabled by the 'BANK' bit in the 'Control' word. The 'BANK' bit points to the bank of 12 words that should be used by the CPU while the math hardware operates on the other 'bank' of 12 words. The 'BANK' bit is flipped whenever any value is written to the 'Start Process' register. The CPU is free to read and write any value in the non-path bank at any time with no fear of damage to the process or its values.

The hardware addresses in the stack are arranged to allow for multiple word transfers to and from the CPU for all data moves.

The CPU must first pre-load all required initial values (including any zero or one constants) into the appropriate place in the stack. The CPU must also pre-set the control bits. Once the initial settings have been done, the CPU starts an operation by writing the appropriate value to the 'StartProcess' register. The CPU must then poll the status register in order to know when the process is completed.

The hardware will, optionally, convert the result of a math operation that has had an overflow into BIGNUM. This is a signed value of either 0x7FFFFFFF or 0x80000001.

Address Space

BeginningEnd
0x0330_06000x0330_07FF

Memory Mapping

Hardware Multiplier Math Stack

Made up of 64 words.

Address R/W 4×4 Name 3×3 Name CCoB-in Name CCoB-out Name
0x0600 R/W Matrix 00 Matrix 00
0x0604 R/W Matrix 01 Matrix 01
0x0608 R/W Matrix 02 Matrix 02
0x060C R/W Matrix 03
0x0610 R/W Matrix 10 Matrix 10
0x0614 R/W Matrix 11 Matrix 11
0x0618 R/W Matrix 12 Matrix 12
0x061C R/W Matrix 13
0x0620 R/W Matrix 20 Matrix 20
0x0624 R/W Matrix 21 Matrix 21
0x0628 R/W Matrix 22 Matrix 22
0x062C R/W Matrix 23
0x0630 R/W Matrix 30
0x0634 R/W Matrix 31
0x0638 R/W Matrix 32
0x063C R/W Matrix 33
0x0640 R/W B0 - X B0 - X B0 - WIDTH B0 - 1/HW
0x0644 R/W B0 - Y B0 - Y B0 - HEIGHT
0x0648 R/W B0 - Z B0 - Z B0 - X01 B0 - HDX
0x064C R/W B0 - W B0 - Y01 B0 - HDY
0x0650 R/W B1 - X B1 - X B1 - WIDTH B1 - 1/HW
0x0654 R/W B1 - Y B1 - Y B1 - HEIGHT
0x0658 R/W B1 - Z B1 - Z B1 - X01 B1 - HDX
0x065C R/W B1 - W B1 - Y01 B1 - HDY
0x0660 R/W B0 - OUTX B0 - OUTX B0 - X03 B0 - VDX
0x0664 R/W B0 - OUTY B0 - OUTY B0 - Y03 B0 - VDY
0x0668 R/W B0 - OUTZ B0 - OUTZ B0 - X0123 B0 - DDX
0x066C R/W B0 - OUTW B0 - Y0123 B0 - DDY
0x0670 R/W B1 - OUTX B1 - OUTX B1 - X03 B1 - VDX
0x0674 R/W B1 - OUTY B1 - OUTY B1 - Y03 B1 - VDY
0x0678 R/W B1 - OUTZ B1 - OUTZ B1 - X0123 B1 - DDX
0x067C R/W B1 - OUTW B1 - Y0123 B1 - DDY
0x0680 R/W B0 - OUTX* B0 - N(MSB) B0 - N(MSB)
0x0684 R/W B0 - OUTY* B0 - N(LSB) B0 - N(LSB)
0x0688 R/W B0 - OUTZ* B0 - N/Z B0 - 1/W
0x068C R/W B0 - OUTW* B0 - 1/H
0x0690 R/W B1 - OUTX* B1 - N(MSB) B1 - N(MSB)
0x0694 R/W B1 - OUTY* B1 - N(LSB) B1 - N(LSB)
0x0698 R/W B1 - OUTZ* B1 - N/Z B1 - 1/W
0x069C R/W B1 - OUTW* B1 - 1/H

Set Control Bits

Address Name R/W RED Green
0x07F0 set control bits R/W
bit0,1: accumulator delay
bit2: use 'early termination'
bit3: own the MAS
bit4: bank
bit0,1: accumulator delay (set to '01')
bit2: select 'early termination' for 'selected process on'
bit3: own the MAS
bit4: bank
bit5: signdiv
bit6: allow BIGNUM conversion

Clear Control Bits

Address Name R/W RED GREEN
0x07F8 status bits R bit0: early process on
bit1: real process on
bit2: overflow
bit0: selected process on (real or early)
bit1: real process on
bit2: overflow
bit3: DivZero
bit4,5,6,7: current or previous process number

Status Bits

Address Name R/W RED GREEN
0x07F8 status bits R bit0: early process on
bit1: real process on
bit2: overflow
bit0: selected process on (real or early)
bit1: real process on
bit2: overflow
bit3: DivZero
bit4,5,6,7: current or previous process number

Start Process

Address Name R/W RED GREEN
0x07FC start process W 0x0: 4×4 MAC
0x1: 3×3 MAC
0x2: 3×3 MAC w/ divide and multiply
0x3: CCoB conversion
0x4: CCoB conversion w/ pre-divided values
0x5: Big Divide (not implemented)
0x7: SWAP?
0x0: swap
0x1: 4×4 MAC
0x2: 3×3 MAC
0x3: 3×3 MAC w/ divide and multiply
0x4: 4×1 MAC
0x5: 1×1 MAC (4 sets)
0x6: reserved
0x7: reserved
0x8: CCoB conversion
0x9: CCoB conversion w/ pre-divided values
0xA: reserved
0xB: reserved
0xC: Small Divide
0xD: Big Divide (not implemented)
0xE: reserved
0xF: reserved

Values greater than 0xF will have bad effects, do not use them.
documentation/hardware/opera/madam/matrix_engine.txt · Last modified: 2022/10/06 23:29 (external edit)