In this recipe you learn how conditional execution can eliminate branch instructions, producing smaller and faster code. Euclid's Greatest Common Divisor algorithm is used for illustrative purposes. Specifically, you will learn how to use:
The ARM's Program Status Register contains, among other flags, copies of the ALU status flags:
--------------------------------------------------------- N |Negative result from ALU flag --------------------------------------------------------- Z |Zero result from ALU flag --------------------------------------------------------- C |ALU operation Carried out --------------------------------------------------------- V |ALU operation oVerflowed ---------------------------------------------------------
Every ARM instruction has a 4 bit field which encodes the conditions under which it will be executed. These conditions refer to the state of the ALU N, Z, C and V flags as follows:
-------------------------------------------------------- EQ |Z set (equal) -------------------------------------------------------- NE |Z clear (not equal) -------------------------------------------------------- CS/HS |C set (unsigned>=) -------------------------------------------------------- CC/LO |C clear (unsigned <) -------------------------------------------------------- MI |N set (negative) -------------------------------------------------------- PL |N clear (positive or zero) -------------------------------------------------------- VS |V set (overflow) -------------------------------------------------------- VC |V clear (no overflow) -------------------------------------------------------- HI |C set and Z clear (unsigned>) -------------------------------------------------------- LS |C clear and Z set (unsigned <=) -------------------------------------------------------- GE |N and V the same (signed>=) -------------------------------------------------------- LT |N and V differ (signed <) -------------------------------------------------------- GT |Z clear, N and V the same (signed>) -------------------------------------------------------- LE |Z set, N and V differ (signed <=) -------------------------------------------------------- AL |Always execute (the default if none is |specified) --------------------------------------------------------
Data processing instructions change the state of the ALU's N,Z,C and V status outputs but these are latched in the PSR'S ALU flags only if a special bit (the 'S' bit) is set in the instruction.
The following code fragment is extracted from gcd.c, which can be found in the examples directory.
while (a != b) { if (a> b) a -= b; else b -= a; }
Without conditional execution this could be naively coded as:
gcd CMP a1, a2 BEQ end BLT lessthan SUB a1, a1, a2 B gcd lessthan SUB a2, a2, a1 B gcd end
Conditional execution and selective setting of the PSR'S ALU flags allows it to be coded much more compactly as follows (this version can be found in the examples directory as gcd.s).
gcd CMP a1, a2 SUBGT a1, a1, a2 SUBLT a2, a2, a1 BNE gcd
Two tricks are illustrated:
You can run the C gcd routine shown above under armsd. To do this first set your current directory to the examples directory.
Compile, link and run the C version of the gcd routine by using the following commands:
armcc -c gcd.c -li -apcs 3/32bit armcc -c gcdtest.c -li -apcs 3/32bit armlink -o gcdtest gcd.o gcdtest.o somewhere/armlib.321 armsd -li gcdtest
where somewhere is the directory in which armlib.32l can be found.
The two armcc commands compile the gcd function and the test harness, creating relocatable object files gcd.o and gcdtest.o. The -li flag tells armcc to compile for a little-endian memory. The -apcs 3/32bit option tells armcc to use a 32 bit version of the ARM Procedure Call Standard. You can omit these options if your armcc has been configured for this default.
The armlink command links your relocatable objects with the ARM C library to create a runnable program (here called gcdtest).
The armsd command invokes the debugger, with gcdtest as the program to be run. Again -li specifies that little-endian memory is required (as with armasm above).
You can run the gcd routine shown above under armsd. To do this first set your current directory to the examples directory.
You can assemble, link and run the assembler gcd routine by using the following commands:
armasm gcd.s -o gcd.o -li armcc -c gcdtest.c -li -3/32bit armlink -o gcdtest gcd.o gcdtest.o somewhere/armlib.32l armsd -li gcdtest
where somewhere is the directory in which armlib.32l can be found.
The armasm command assembles the gcd function, creating the relocatable object file gcd.o. The -li flag tells armasm to assemble for a little-endian memory. The -apcs 3/32bit option tells armcc to use a 32 bit version of the ARM Procedure Call Standard. You can omit these options if your armasm has been configured for this default.
The armcc command compiles the test harness. The -c flag tells armcc not to link its output with the C library; the -li flag tells armcc to compile for a little-endian memory (as with armasm).
The armlink command links your relocatable objects with the ARM C library to create a runnable program (here called gcdtest).
The armsd command invokes the debugger, with gcdtest as the program to be run. Again -li specifies that little-endian memory is required (as with armasm above).
Original: https://ext.3dodev.com/3DO/Portfolio_2.5/OnLineDoc/DevDocs/tktfldr/acbfldr/1acbb.html