documentation:hardware:opera:madam:matrix_engine

The ARM60's ALU was not adequate for 3D math calculations so a hardware engine with a focus on matrix multiplication was added to MADAM.

Not all capabilities were exposed through the APIs but the hardware supported a number of functions:

- 4 x 4 matrix multiply with 16.16 initial values and a 32.32 result.
- 3 x 3 matrix multiply with 16.16 initial values and a 16.16 result.
- 3 x 3 matrix multiply with N/Z calculation. All initial values and results are 16.16.
- 4 x 1 multiply with 16.16 initial values and a 32.32 result.
- Four sets of 1 x 1 multiply with 16.16 initial values and a 32.32 result.
- 16.16 divided by 16.16 with 16.16 result (no remainder).
- CCoB value generator starting from X and Y deltas and H and W.
- Short form of CCoB generation using pre-divided H and W.

The basic engine uses a stack of fourty 32bit values, a 32 by 32 signed multiplier, and a 64bit accumulator. The stack is broken up into 3 sections. The first is a 16 word section that contains the 16 values that will be used by all matrix operations. These values are not distrubed by any math operation. The other two sections are 12 words each and are alternately enabled by the 'BANK' bit in the 'Control' word. The 'BANK' bit points to the bank of 12 words that should be used by the CPU while the math hardware operates on the other 'bank' of 12 words. The 'BANK' bit is flipped whenever any value is written to the 'Start Process' register. The CPU is free to read and write any value in the non-path bank at any time with no fear of damage to the process or its values.

The hardware addresses in the stack are arranged to allow for multiple word transfers to and from the CPU for all data moves.

The CPU must first pre-load all required initial values (including any zero or one constants) into the appropriate place in the stack. The CPU must also pre-set the control bits. Once the initial settings have been done, the CPU starts an operation by writing the appropriate value to the 'StartProcess' register. The CPU must then poll the status register in order to know when the process is completed.

The hardware will, optionally, convert the result of a math operation that has had an overflow into BIGNUM. This is a signed value of either 0x7FFFFFFF or 0x80000001.

Beginning | End |
---|---|

0x0330_0600 | 0x0330_07FF |

Made up of 64 words.

Address | R/W | 4×4 Name | 3×3 Name | CCoB-in Name | CCoB-out Name |
---|---|---|---|---|---|

0x0600 | R/W | Matrix 00 | Matrix 00 | ||

0x0604 | R/W | Matrix 01 | Matrix 01 | ||

0x0608 | R/W | Matrix 02 | Matrix 02 | ||

0x060C | R/W | Matrix 03 | |||

0x0610 | R/W | Matrix 10 | Matrix 10 | ||

0x0614 | R/W | Matrix 11 | Matrix 11 | ||

0x0618 | R/W | Matrix 12 | Matrix 12 | ||

0x061C | R/W | Matrix 13 | |||

0x0620 | R/W | Matrix 20 | Matrix 20 | ||

0x0624 | R/W | Matrix 21 | Matrix 21 | ||

0x0628 | R/W | Matrix 22 | Matrix 22 | ||

0x062C | R/W | Matrix 23 | |||

0x0630 | R/W | Matrix 30 | |||

0x0634 | R/W | Matrix 31 | |||

0x0638 | R/W | Matrix 32 | |||

0x063C | R/W | Matrix 33 | |||

0x0640 | R/W | B0 - X | B0 - X | B0 - WIDTH | B0 - 1/HW |

0x0644 | R/W | B0 - Y | B0 - Y | B0 - HEIGHT | |

0x0648 | R/W | B0 - Z | B0 - Z | B0 - X01 | B0 - HDX |

0x064C | R/W | B0 - W | B0 - Y01 | B0 - HDY | |

0x0650 | R/W | B1 - X | B1 - X | B1 - WIDTH | B1 - 1/HW |

0x0654 | R/W | B1 - Y | B1 - Y | B1 - HEIGHT | |

0x0658 | R/W | B1 - Z | B1 - Z | B1 - X01 | B1 - HDX |

0x065C | R/W | B1 - W | B1 - Y01 | B1 - HDY | |

0x0660 | R/W | B0 - OUTX | B0 - OUTX | B0 - X03 | B0 - VDX |

0x0664 | R/W | B0 - OUTY | B0 - OUTY | B0 - Y03 | B0 - VDY |

0x0668 | R/W | B0 - OUTZ | B0 - OUTZ | B0 - X0123 | B0 - DDX |

0x066C | R/W | B0 - OUTW | B0 - Y0123 | B0 - DDY | |

0x0670 | R/W | B1 - OUTX | B1 - OUTX | B1 - X03 | B1 - VDX |

0x0674 | R/W | B1 - OUTY | B1 - OUTY | B1 - Y03 | B1 - VDY |

0x0678 | R/W | B1 - OUTZ | B1 - OUTZ | B1 - X0123 | B1 - DDX |

0x067C | R/W | B1 - OUTW | B1 - Y0123 | B1 - DDY | |

0x0680 | R/W | B0 - OUTX* | B0 - N(MSB) | B0 - N(MSB) | |

0x0684 | R/W | B0 - OUTY* | B0 - N(LSB) | B0 - N(LSB) | |

0x0688 | R/W | B0 - OUTZ* | B0 - N/Z | B0 - 1/W | |

0x068C | R/W | B0 - OUTW* | B0 - 1/H | ||

0x0690 | R/W | B1 - OUTX* | B1 - N(MSB) | B1 - N(MSB) | |

0x0694 | R/W | B1 - OUTY* | B1 - N(LSB) | B1 - N(LSB) | |

0x0698 | R/W | B1 - OUTZ* | B1 - N/Z | B1 - 1/W | |

0x069C | R/W | B1 - OUTW* | B1 - 1/H |

Address | Name | R/W | RED | Green |
---|---|---|---|---|

0x07F0 | set control bits | R/W | bit0,1: accumulator delay bit2: use 'early termination' bit3: own the MAS bit4: bank | bit0,1: accumulator delay (set to '01') bit2: select 'early termination' for 'selected process on' bit3: own the MAS bit4: bank bit5: signdiv bit6: allow BIGNUM conversion |

Address | Name | R/W | RED | GREEN |
---|---|---|---|---|

0x07F8 | status bits | R | bit0: early process on bit1: real process on bit2: overflow | bit0: selected process on (real or early) bit1: real process on bit2: overflow bit3: DivZero bit4,5,6,7: current or previous process number |

Address | Name | R/W | RED | GREEN |
---|---|---|---|---|

0x07F8 | status bits | R | bit0: early process on bit1: real process on bit2: overflow | bit0: selected process on (real or early) bit1: real process on bit2: overflow bit3: DivZero bit4,5,6,7: current or previous process number |

Address | Name | R/W | RED | GREEN |
---|---|---|---|---|

0x07FC | start process | W | 0x0: 4×4 MAC 0x1: 3×3 MAC 0x2: 3×3 MAC w/ divide and multiply 0x3: CCoB conversion 0x4: CCoB conversion w/ pre-divided values 0x5: Big Divide (not implemented) 0x7: SWAP? | 0x0: swap 0x1: 4×4 MAC 0x2: 3×3 MAC 0x3: 3×3 MAC w/ divide and multiply 0x4: 4×1 MAC 0x5: 1×1 MAC (4 sets) 0x6: reserved 0x7: reserved 0x8: CCoB conversion 0x9: CCoB conversion w/ pre-divided values 0xA: reserved 0xB: reserved 0xC: Small Divide 0xD: Big Divide (not implemented) 0xE: reserved 0xF: reserved Values greater than 0xF will have bad effects, do not use them. |

documentation/hardware/opera/madam/matrix_engine.txt · Last modified: 2022/10/06 23:29 (external edit)