The ARM60's ALU was not adequate for 3D math calculations so a hardware engine with a focus on matrix multiplication was added to MADAM. Not all capabilities were exposed through the APIs but the hardware supported a number of functions: * 4 x 4 matrix multiply with 16.16 initial values and a 32.32 result. * 3 x 3 matrix multiply with 16.16 initial values and a 16.16 result. * 3 x 3 matrix multiply with N/Z calculation. All initial values and results are 16.16. * 4 x 1 multiply with 16.16 initial values and a 32.32 result. * Four sets of 1 x 1 multiply with 16.16 initial values and a 32.32 result. * 16.16 divided by 16.16 with 16.16 result (no remainder). * CCoB value generator starting from X and Y deltas and H and W. * Short form of CCoB generation using pre-divided H and W. The basic engine uses a stack of fourty 32bit values, a 32 by 32 signed multiplier, and a 64bit accumulator. The stack is broken up into 3 sections. The first is a 16 word section that contains the 16 values that will be used by all matrix operations. These values are not distrubed by any math operation. The other two sections are 12 words each and are alternately enabled by the 'BANK' bit in the 'Control' word. The 'BANK' bit points to the bank of 12 words that should be used by the CPU while the math hardware operates on the other 'bank' of 12 words. The 'BANK' bit is flipped whenever any value is written to the 'Start Process' register. The CPU is free to read and write any value in the non-path bank at any time with no fear of damage to the process or its values. The hardware addresses in the stack are arranged to allow for multiple word transfers to and from the CPU for all data moves. The CPU must first pre-load all required initial values (including any zero or one constants) into the appropriate place in the stack. The CPU must also pre-set the control bits. Once the initial settings have been done, the CPU starts an operation by writing the appropriate value to the 'StartProcess' register. The CPU must then poll the status register in order to know when the process is completed. The hardware will, optionally, convert the result of a math operation that has had an overflow into BIGNUM. This is a signed value of either 0x7FFFFFFF or 0x80000001. ====== Address Space ====== ^Beginning^End| |0x0330_0600|0x0330_07FF| ====== Memory Mapping ====== ===== Hardware Multiplier Math Stack ===== Made up of 64 words. ^Address ^R/W ^4×4 Name ^3×3 Name ^CCoB-in Name ^CCoB-out Name | |0x0600 |R/W |Matrix 00 |Matrix 00 | | | |0x0604 |R/W |Matrix 01 |Matrix 01 | | | |0x0608 |R/W |Matrix 02 |Matrix 02 | | | |0x060C |R/W |Matrix 03 | | | | |0x0610 |R/W |Matrix 10 |Matrix 10 | | | |0x0614 |R/W |Matrix 11 |Matrix 11 | | | |0x0618 |R/W |Matrix 12 |Matrix 12 | | | |0x061C |R/W |Matrix 13 | | | | |0x0620 |R/W |Matrix 20 |Matrix 20 | | | |0x0624 |R/W |Matrix 21 |Matrix 21 | | | |0x0628 |R/W |Matrix 22 |Matrix 22 | | | |0x062C |R/W |Matrix 23 | | | | |0x0630 |R/W |Matrix 30 | | | | |0x0634 |R/W |Matrix 31 | | | | |0x0638 |R/W |Matrix 32 | | | | |0x063C |R/W |Matrix 33 | | | | |0x0640 |R/W |B0 - X |B0 - X |B0 - WIDTH |B0 - 1/HW | |0x0644 |R/W |B0 - Y |B0 - Y |B0 - HEIGHT | | |0x0648 |R/W |B0 - Z |B0 - Z |B0 - X01 |B0 - HDX | |0x064C |R/W |B0 - W | |B0 - Y01 |B0 - HDY | |0x0650 |R/W |B1 - X |B1 - X |B1 - WIDTH |B1 - 1/HW | |0x0654 |R/W |B1 - Y |B1 - Y |B1 - HEIGHT | | |0x0658 |R/W |B1 - Z |B1 - Z |B1 - X01 |B1 - HDX | |0x065C |R/W |B1 - W | |B1 - Y01 |B1 - HDY | |0x0660 |R/W |B0 - OUTX |B0 - OUTX |B0 - X03 |B0 - VDX | |0x0664 |R/W |B0 - OUTY |B0 - OUTY |B0 - Y03 |B0 - VDY | |0x0668 |R/W |B0 - OUTZ |B0 - OUTZ |B0 - X0123 |B0 - DDX | |0x066C |R/W |B0 - OUTW | |B0 - Y0123 |B0 - DDY | |0x0670 |R/W |B1 - OUTX |B1 - OUTX |B1 - X03 |B1 - VDX | |0x0674 |R/W |B1 - OUTY |B1 - OUTY |B1 - Y03 |B1 - VDY | |0x0678 |R/W |B1 - OUTZ |B1 - OUTZ |B1 - X0123 |B1 - DDX | |0x067C |R/W |B1 - OUTW | |B1 - Y0123 |B1 - DDY | |0x0680 |R/W |B0 - OUTX* |B0 - N(MSB) |B0 - N(MSB) | | |0x0684 |R/W |B0 - OUTY* |B0 - N(LSB) |B0 - N(LSB) | | |0x0688 |R/W |B0 - OUTZ* |B0 - N/Z |B0 - 1/W | | |0x068C |R/W |B0 - OUTW* | |B0 - 1/H | | |0x0690 |R/W |B1 - OUTX* |B1 - N(MSB) |B1 - N(MSB) | | |0x0694 |R/W |B1 - OUTY* |B1 - N(LSB) |B1 - N(LSB) | | |0x0698 |R/W |B1 - OUTZ* |B1 - N/Z |B1 - 1/W | | |0x069C |R/W |B1 - OUTW* | |B1 - 1/H | | ===== Set Control Bits ===== ^Address ^Name ^R/W ^RED ^Green | |0x07F0 |set control bits |R/W | \\ bit0,1: accumulator delay \\ bit2: use 'early termination' \\ bit3: own the MAS \\ bit4: bank |bit0,1: accumulator delay (set to '01') \\ bit2: select 'early termination' for 'selected process on' \\ bit3: own the MAS \\ bit4: bank \\ bit5: signdiv \\ bit6: allow BIGNUM conversion | ===== Clear Control Bits ===== ^Address ^Name ^R/W ^RED ^GREEN | |0x07F8 |status bits |R |bit0: early process on \\ bit1: real process on \\ bit2: overflow |bit0: selected process on (real or early) \\ bit1: real process on \\ bit2: overflow \\ bit3: DivZero \\ bit4,5,6,7: current or previous process number | ===== Status Bits ===== ^Address ^Name ^R/W ^RED ^GREEN | |0x07F8 |status bits |R |bit0: early process on \\ bit1: real process on \\ bit2: overflow |bit0: selected process on (real or early) \\ bit1: real process on \\ bit2: overflow \\ bit3: DivZero \\ bit4,5,6,7: current or previous process number | ===== Start Process ===== ^Address ^Name ^R/W ^RED ^GREEN | |0x07FC |start process |W |0x0: 4×4 MAC \\ 0x1: 3×3 MAC \\ 0x2: 3×3 MAC w/ divide and multiply \\ 0x3: CCoB conversion \\ 0x4: CCoB conversion w/ pre-divided values \\ 0x5: Big Divide (not implemented) \\ 0x7: SWAP? |0x0: swap \\ 0x1: 4×4 MAC \\ 0x2: 3×3 MAC \\ 0x3: 3×3 MAC w/ divide and multiply \\ 0x4: 4×1 MAC \\ 0x5: 1×1 MAC (4 sets) \\ 0x6: reserved \\ 0x7: reserved \\ 0x8: CCoB conversion \\ 0x9: CCoB conversion w/ pre-divided values \\ 0xA: reserved \\ 0xB: reserved \\ 0xC: Small Divide \\ 0xD: Big Divide (not implemented) \\ 0xE: reserved \\ 0xF: reserved \\ \\ Values greater than 0xF will have bad effects, do not use them. |