Site Tools


documentation:development:opera:pf25:tktfldr:acbfldr:1acbd

Flexibility of load and store multiple

About this recipe

In this recipe you learn about:

  • the benefits and capabilities of the load and store multiple instructions;
  • types of stacks supported directly by load and store multiple.

Multiple vs single transfers

The Load and Store Multiple instructions provide a way to efficiently move the contents of several registers to and from memory. The advantages of using a single load or store multiple instruction over a series of load or store single instructions are:

  • Smaller code size;
  • On Von Neumann architectures such as all ARMs up to the ARM6 family, there is only a single instruction fetch overhead, rather than many instruction fetches.
  • On Von Neumann architectures, only one register write back cycle is required for a load multiple, as opposed to one for every load single;
  • On uncached ARM processors, the first word of data transfered by a load or store multiple will always be a non-sequential memory cycle, but all subsequent words transferred can be sequential (faster) memory cycles.

The register list

The registers the load and store multiple instructions transfer are encoded into the instruction by one bit for each of the registers R0 to R15. A set bit indicates the register will be transferred, and a clear bit indicates that it will not be transferred. Thus it is possible to transfer any subset of the registers in a single instruction.

The way the subset of registers to be transferred is specified is simply by listing those registers which are to be transferred in curly brackets eg.

{R1, R4-R6, R8, R10}

Increment / Decrement, Before / After

The base address for the transfer can either be incremented or decremented between register transfers, and this can happen either before or after each register transfer. eg.

STMIA R10, {R1, R3-R5, R8}

The suffix IA could also have been IB, DA or DB, where I indicates increment, D decrement, A after and B before.

Base register writeback

In the last instruction, although the address of the transfer was changed after each transfer, the base register was not updated at any point. Register writeback can be specified so that the base register is updated. Clearly the base register will change by the same amount whether “before” or “after” is selected. An example of a load multiple using base writeback is:

LDMDB R11!, {R9, R4-R7}

Note

In all cases the lowest numbered register is transferred to or from the lowest memory address, and the highest numbered register to or from the highest address. [The order in which the registers are listed in the register list makes no difference. Also, the ARM always performs sequential memory accesses in increasing memory address order. Therefore 'decrementing' transfers actually perform a subtraction first and then increment the transfer address register by register].

Stack notation

Since the load and store multiple instructions have the facility to update the base register (which for stack operations can be the stack pointer), these instructions provide single instruction push and pop operations for any number of registers. Load multiple being pop, and store multiple being push.

There are several types of stack which the Load and Store Multiple Instructions can be used with:

  • Ascending or descending stacks. ie. the stack grows up memory or down memory. [Sometimes a pair of stacks, one of which grows up memory and one of which grows downwards are used - thus choosing the direction is not always just a matter of taste].
  • Empty or Full stacks. The stack pointer can either point to the top item in the stack (a full stack), or the next free space on the stack (an empty stack).

As stated above, pop and push operations for these stacks can be implemented directly by load and store multiple instructions. To make it easier for the programmer special stack sufficies can be added to the LDM and STM instructions (as an alternative to Increment / Decrement and Before / After sufficies) as follows:

STMFA R10!, {R0-R5}   ; Push R0-R5 onto a Full Ascending Stack
LDMFA R10!, {R0-R5}   ; Pop  R0-R5 from a Full Ascending Stack

STMFD R10!, {R0-R5}   ; Push R0-R5 onto a Full Descending Stack
LDMFD R10!, {R0-R5}   ; Pop  R0-R5 from a Full Descending Stack

STMEA R10!, {R0-R5}   ; Push R0-R5 onto an Empty Ascending Stack
LDMEA R10!, {R0-R5}   ; Pop  R0-R5 from an Empty Ascending Stack

STMED R10!, {R0-R5}   ; Push R0-R5 onto an Empty Descending Stack
LDMED R10!, {R0-R5}   ; Pop  R0-R5 from an Empty Descending Stack

For more information on using stacks in assembly language see Stacks in assembly language.

For further discussion of some of the benefits which can be gained by using LDM and STM see Loop unrolling.


Original: https://ext.3dodev.com/3DO/Portfolio_2.5/OnLineDoc/DevDocs/tktfldr/acbfldr/1acbd.html

documentation/development/opera/pf25/tktfldr/acbfldr/1acbd.txt · Last modified: 2023/09/14 19:54 by trapexit