Jan 31, 2016 Arm-specific assembler directives. The.syntax directive allows you to set the instruction set syntax. I recommend setting the syntax to unified as the first thing in your sources (right after your.include files).syntax unified-This will make sure that the GNU assembler is using a modern syntax for Arm THUMB instructions.
ARM assembly on macOS. We've been doing ARM assembly in class and I've been loving it. My only gripe is that I have to use ARMSim on my Bootcamp partition to run Assembly code. Sep 11, 2013 Once you move beyond short sequences of optimised Arm assembler, the next likely step will be to managing more complex, optimised routines using macros and functions. Macros are good for short repeated sequences, but often quickly increase the size of your code. I recently decided that I wanted to try learning how to program in assembly. Having a 2011 model MBP, I was not able to find much info on how to write and execute assembly code on my computer.
by Carl Burch, Hendrix College, October 2011
Introducing ARM assembly language by Carl Burch is licensed under a CreativeCommons Attribution-Share Alike 3.0 United StatesLicense.
Based on a work atwww.toves.org/books/arm/.
Contents
1. Background
1.1. Definitions
1.2. ISA varieties
2. ARM assembly basics
2.1. A simple program: Adding numbers
2.2. Another example: Hailstone sequence
2.3. Another example: Adding digits
2.4. Summary of instructions so far
2.5. Condition codes
3. Memory
3.1. Basic memory instructions
3.2. Addressing modes
3.3. Initializing memory
3.4. Multiple-register memory instructions
1.1. Definitions
1.2. ISA varieties
2. ARM assembly basics
2.1. A simple program: Adding numbers
2.2. Another example: Hailstone sequence
2.3. Another example: Adding digits
2.4. Summary of instructions so far
2.5. Condition codes
3. Memory
3.1. Basic memory instructions
3.2. Addressing modes
3.3. Initializing memory
3.4. Multiple-register memory instructions
In this document, we study assembly language,the system for expressing the individual instructions that a computershould perform.
1. Background
We are actually concerned with two types of languages, assemblylanguages and machine languages.
1.1. Definitions
A machine language encodes instructions as sequences of0's and 1's; this binary encoding is what the computer's processor isbuilt to execute. Writing programs using this encoding is unwieldy forhuman programmers, though. Thus, when programmers want to dictatethe precise instructions that the computer is to perform, they usean assembly language, which allows instructions to be writtenin textual form. An assembler translates a filecontaining assembly language code into the corresponding machinelanguage.
Let's look at a simple example for ARM's design. Here is amachine language instruction:
1110 0001 1010 0000 0011 0000 0000 1001
When the processor is told to execute that binary sequence, itcopies the value from “register 9” into “register 3.”But as a programmer, you'd hardly want to read a long binarysequence and make sense of it. Instead, a programmer wouldprefer programming in assembly language, where we would expressthis using the following line.
MOVR3, R9
Then the programmer would use an assembler to translate thisinto the binary encoding that the computer actually executes.
But there is not just one machine language:A different machine language is designed for each line of processors,designed with an eye to provide a powerful set of fast instructionswhile allowing a relatively simple circuit to be built.Often processors are designed to be compatible with a previousprocessor, so it follows the same machine language design.For example, Intel's line of processors (including 80386, Pentium,and Core i7) support similar machine languages.But ARM processors support an entirely different machinelanguage. The design of the machine language encoding is calledthe instruction set architecture (ISA).
And for each machine language, there must be a differentassembly language, since the assembly language must correspond to anentirely different set of machine language instructions.
1.2. ISA varieties
Of the many ISAs (instruction set architectures), x86 is handilythe most widely recognized. It was first designed by Intel in 1974 infor an 8-bit processor (the Intel 8080), and over the years it wasextended to 16-bit form (1978, Intel 8086), then to 32-bit form(1985, Intel 80386), and then to 64-bit form (2003, AMD Opteron).Today, processors supporting IA32 are now manufactured by Intel, AMD,and VIA, and they can be found in most personal computers.
Another well-known ISA today is the PowerPC.Apple's Macintosh computers used these processors until 2006,when Apple switched their computers to the x86 line of processors.But PowerPC remains in common use for applications such as automobilesand gaming consoles (including the Wii, Playstation 3, andXBox 360).
But the ISA that we'll study comes from a company called ARM.(Like other successful ISAs, ARM's ISA has grown over the years. We'llexamine version 4T.)Processors supporting ARM's ISA are distributed quite widely,usually for low-power devices such as cellphones, digital music players,and handheld game systems. The iPhone, Kindle and Nintendo DS are allprominent examples of devices that incorporate an ARM processor.
There are several reasons for examining ARM's ISA rather thanIA32.
Arm Assembler For Macos Windows 7
- Assembly language programming is rarely used for more powerfulcomputing systems, since it's far easier to program in ahigh-level programming language. But for small devices, assemblylanguage programming remains important: Due to power and priceconstraints, the devices have very few resources, and developerscan use assembly language to use these resources as efficiently aspossible.
- The multiple extensions to the IA32 architecture lead it to befar too complicated for us to really understand thoroughly.
- IA32 dates from the 1970's, which was a completely different erain computing. ARM is more representative of more modern ISAdesigns.
2. ARM assembly basics
We'll now turn to examining ARM's ISA.
2.1. A simple program: Adding numbers
Let's start our introduction using a simple example.Imagine that we want to add the numbersfrom 1 to 10. We might do this in C as follows.
inttotal;
inti;
total = 0;
for (i = 10; i > 0; i--) {
total += i;
}
The following translates this into the instructions supported byARM's ISA.
MOVR0, #0; R0 accumulates total
MOVR1, #10; R1 counts from 10 down to 1
again ADDR0, R0, R1
SUBSR1, R1, #1
BNE again
halt B halt ; infinite loop to stop computation
You'll notice the mentions of
R0
and R1
in the assemblylanguageprogram. These are references to registers, which areplaces in a processor for storing data during computation. The ARMprocessor includes 16 easily accessible registers, numbered R0
throughR15
. Each stores a single 32-bit number. Note that though registersstore data, they are very separate from the notion of memory:Memory is typically much larger (kilobytes or oftengigabytes), and so it typically exists outside of the processor.Because of memory's size, accessing memory takes more time thanaccessing registers — typically about 10 times as long.Thus, assembly language programmingtends to focus on using registers when possible.Because each line of an assembly language program correspondsdirectly to machine language, the lines are highly restricted in theirformat. You can see that each line consists of two parts:First is the opcode such as
MOV
that is an abbreviationindicating the type of operation; and after it comesarguments such as “R0, #0
”.Each opcode has strict requirements on the allowed arguments. Forexample, a MOV
instruction must have exactly twoarguments: the first must identify a register, and the secondmust provide either a register or a constant(prefixed by a ‘#’).A constant placed directly in aninstruction is called an immediate, since it isimmediately available to the processor when reading theinstruction.In the above assembly language program, we first use the
MOV
instruction to initialize R0
at 0 and R1
at 10.The ADD
instruction computes the sum of R0
andR1
(thesecond and third arguments) and places the result into R0
(the firstargument); this corresponds to the total += i;
lineof the equivalent C program.The subsequent SUBS
instruction decreases R1
by 1.To understand the next instruction, we need to understand that inaddition to the registers
R0
through R15
, the ARM processor alsoincorporates a set of four “flags,” labeled the zero flag (Z), thenegative flag (N), the carry flag (C), and the overflow flag (V).Whenever an arithmetic instruction has an S at its end, asSUBS
does, theseflags will be updated based on the result of the computation.In this case, if the result of decreasing R1
by 1 results in 0, the Zflag will become 1; the N, C, and V flags are also updated, but they'renot pertinent to our discussion of this code.The following instruction,
BNE
, will check the Z flag.If the Z flag is not set (i.e., the previous subtraction gives a nonzeroresult), then BNE
arranges the processor so that the nextinstruction executed is the ADD
instruction, labeledagain
; this leads to repeating the loop with a smallervalue of R1
. If the Z flag is set, the processor willsimply continue on to the next instruction.(BNE
stands for Branch if Not Equal.The name comes from imagining that we want to check whether two numbersare equal. One way to do this using ARM's ISA would be to firsttell the processor to subtract the two numbers; if the difference iszero, then the two numbers must be equal, and the zero flag will be 1.them results in zero, which would set the zero flag.)The final instruction,
B
, always branches back tothe named instruction. In this program, the instruction names itself,effectively halting the program by putting the computer into a tightinfinite loop.2.2. Another example: Hailstone sequence
Now, let's consider the hailstone sequence.Given an integer n, we repeatedly want to apply the followingprocedure.
iters ← 0
whilen ≠ 1:
iters ← iters + 1
ifn is odd:
n ← 3 ⋅ n + 1
else:
n ← n / 2
For example, if we start with 3, then since this is odd ournext number is 3 ⋅ 3 + 1 = 10.This is even, so our next number is 10 / 2 = 5.This is odd, so our next number is 3 ⋅ 5 + 1 = 16.This is even, so we then go to 8, which is still even, so we goto 4, then 2, and 1.
In translating this to ARM's assembly language,we must confront the fact that ARM lacks any instructionsrelated to division.(Designers felt division too rarely necessary to merit wasting transistorson the complex circuit that it requires.)Fortunately, the division in this algorithm is relativelysimple: We merely divide n by 2, which can be done with aright shift.
ARM has an unusual approach to shifting:We have already seen that every basic arithmetic instruction,the final argument can bea constant (as in
SUBSR1, R1, #1
)or a register (as in ADDR0, R0, R1
).But when the final argument is a register, we can optionally adda shift distance:For instance, the instruction“ADDR0, R0, R1, LSL #1
”.says to add a left-shifted version of R1
before addingit to R0
(while R1
itself remains unchanged).The ARM instruction set supports four types of shifting:LSL | logical shift left |
LSR | logical shift right |
ASR | arithmetic shift right |
ROR | rotate right |
The shift distance can be an immediate between 1 and 32,or it can be based on a register value:“
MOVR0, R1, ASR R2
”is equivalent to “R0 = R1 >> R2
”.In translating our pseudocode to assembly language,we'll find the shift operations usefulboth for multipling n by 3(computed as n + (n « 1))and for dividing n by 2(computed as n » 1).We'll also need to deal with testing whetherwhether n is odd.We can do this by testing whether n's 1's bit is set,which we can accomplish usingthe
ANDS
instruction to perform a bitwise AND with 1.The ANDS
instruction sets the Z flag based onwhether the result is 0.If the result is 0, then this means that the 1's bit of n is 0,and so n is even.MOVR0, #5; R0 is current number
MOVR1, #0; R1 is count of number of iterations
again ADDR1, R1, #1; increment number of iterations
ANDSR0, R0, #1; test whether R0 is odd
BEQ even
ADDR0, R0, R0, LSL #1; if odd, set R0 = R0 + (R0 << 1) + 1
ADDR0, R0, #1; and repeat (guaranteed R0 > 1)
B again
even MOVR0, R0, ASR #1; if even, set R0 = R0 >> 1
SUBSR7, R0, #1; and repeat if R0 != 1
BNE again
halt B halt ; infinite loop to stop computation
2.3. Another example: Adding digits
Let's look at another example. Here, suppose that we want to add thedigits of a positive number; for example, given the number 1,024, wewould want to compute 1 + 0 + 2 + 4, whichis 7. The obvious way to express this in C is as follows.
total = 0;
while (i > 0) {
total += i % 10;
i /= 10;
}
It's difficult to translate this into ARM's ISA, though, since theARM lacks any instruction for dividing values. However, we canuse a clever trick to perform this division usingmultiplication: If we take a number and multiply by232 / 10, the upper 32 bits of the product tell usthe result ofdividing the original number by 10. This insight leads to the followingalternative way of summing the digits in a number.
base = 0x1999999A;
total = 0;
while (i > 0) {
iDiv10 = (i * base) >> 32;
total += i - iDiv10 * 10;
i = iDiv10;
}
In translating this into assembly code, we have to confront twoissues. The more obvious is determining which instruction to use toperform the multiplication. Here, we want to use the
UMULL
instruction (Unsigned MULtiply Long), whichinterprets two registers as unsigned 32-bit numbers,and places the 64-bit product of the registers' values into twodifferent registers. The below example illustrates.UMULLR4, R5, R0, R2; computes R0 * R2, placing lower 32 bits in R4, upper 32 in R5
The less obvious issue we have to confront is that of placing0x1999999A into a register. You might be tempted at first to use
MOV
, but this instruction has a major limitation:Any immediate valuemust be rotated by an even number of places to reach an eight-bit value.For numbers between 0 and 255, this is not a problem; nor it is aproblem for 1,024, since 0x400 can be achieved by rotating 1 left 12places. But there's no way to do this for 0x1999999A. The solution we'lluse is to load each byte separately, joining them using theORR
instruction, which computes the bitwise OR of twovalues.MOVR0, #1024; R0 is input, decreases by factors of 10
MOVR1, #0; R1 is sum of digits
MOVR2, #0x19000000; R2 is constantly 0x1999999A
ORRR2, R2, #0x00990000
ORRR2, R2, #0x00009900
ORRR2, R2, #0x0000009A
MOVR3, #10; R3 is constantly 10
loop UMULLR4, R5, R0, R2; R5 is R0 / 10
UMULLR4, R6, R5, R3; R4 is now 10 * (R0 / 10)
SUBR4, R0, R4; R5 is now one's digit of R0
ADDR1, R1, R4; add it into R1
MOVSR0, R5
BNE loop
halt B halt
By the way, you may sometimes want to place a small negative numberlike −10 into a register. You can't use
MOV
toaccomplish this, because its two's-complement representation is0xFFFFFFF6, which can't be rotated into an 8-bit number. If it happensthat to know that some register holds the number 0, then you could useSUB
. But if it doesn't, then the MVN
(MoVe Not) instruction is useful: It places thebitwise NOT of its argument into the destination register. So to get−10 into R0
, we can use“MVNR0, #0x9
”.2.4. Summary of instructions so far
The ARM includes sixteen “basic” arithmetic instructions,numbered 0 through 15.All sixteen arelisted below, with the functionality summarized by the relevantC operator. (The number at the beginning of each line is usedin translating the instructions into machine language. There'sno reason for programmers to memorize this correspondence,though: After all, this is why we have assemblers.)
Figure 1: ARM's basic arithmetic instructions0. | AND regd, rega, argb | regd ← rega & argb |
1. | EOR regd, rega, argb | regd ← rega ^ argb |
2. | SUB regd, rega, argb | regd ← rega − argb |
3. | RSB regd, rega, argb | regd ← argb-rega |
4. | ADD regd, rega, argb | regd ← rega + argb |
5. | ADC regd, rega, argb | regd ← rega + argb + carry |
6. | SBC regd, rega, argb | regd ← rega − argb − !carry |
7. | RSC regd, rega, argb | regd ← argb − rega − !carry |
8. | TST rega, argb | set flags for rega & argb |
9. | TEQ rega, argb | set flags for rega ^ argb |
10. | CMP rega, argb | set flags for rega − argb |
11. | CMN rega, argb | set flags for rega + argb |
12. | ORR regd, rega, argb | regd ← rega | argb |
13. | MOV regd, arg | regd ← arg |
14. | BIC regd, rega, argb | regd ← rega & ~argb |
15. | MVN regd, arg | regd ← ~argb |
Except for
TST
, TEQ
, CMP
, andCMN
, all instructions may have an S postfixed tothe opcode to signify that the operation should set the flags. ForTST
, TEQ
, CMP
, andCMN
, the S is implicit: The instructions don'tchange any general-purpose registers, so the only point in performingthe instruction is to set the flags.We've also seen three other opcodes that aren't in the aboveof basic arithmetic instructions:
UMULL
is a “non-basic” arithmetic instruction,and B
and BNE
aren't arithmetic instructions.2.5. Condition codes
Each ARM instruction may incorporate a conditioncode specifying that the operation should take place onlywhen certain combinations of the flags hold. You can specify thecondition code by including it as part of the opcode.It usually comes at the end of the opcode, but it precedes theoptional S on the basic arithmetic instructions.The name for the condition codes is based onthe supposition that the flags were set based on a
Figure 2: ARM's condition codesCMP
orSUBS
instruction.0. | EQ | equal | Z |
1. | NE | not equal | !Z |
2. | CS or HS | carry set / unsigned higher or same | C |
3. | CC or LO | carry clear / unsigned lower | !C |
4. | MI | minus / negative | N |
5. | PL | plus / positive or zero | !N |
6. | VS | overflow set | V |
7. | VC | overflow clear | !V |
8. | HI | unsigned higher | C && !Z |
9. | LS | unsigned lower or same | !C || Z |
10. | GE | signed greater than or equal | N V |
11. | LT | signed less than | N != V |
12. | GT | signed greater than | !Z && (N V) |
13. | LE | signed greater than or equal | Z || (N != V) |
14. | AL or omitted | always | true |
The only instance of this condition code we have seen so faris the
BNE
instruction: In this case, we have a B
instruction for branching, but the branch only takes place ifthe Z flag is 0.But ARM's ISA allows us to apply condition codes to other opcodes,too. For example,
ADDEQ
says to perform an addition ifthe Z flag is 1. One common scenario using condition codes onnon-branch instructions is in computing the greatest common divisorof two numbers using Euclid's GCD algorithm.a = 40;
b = 25;
while (a != b) {
if (a > b) a -= b;
elseb -= a;
}
The traditional translation to assembly languagewould use condition codes only on branch instructions.
MOVR0, #40; R0 is a
MOVR1, #25; R1 is b
again CMPR0, R1
BEQ halt
BLT isLess
SUBR0, R0, R1
B again
isLess SUBR1, R1, R0
B again
halt B halt
However, the following is a much shorter and more efficienttranslation.
MOVR0, #40; R0 is a
MOVR1, #25; R1 is b
again CMPR0, R1
SUBGTR0, R0, R1
SUBLTR1, R1, R0
BNE again
halt B halt
This is more efficient for two reasons. More obviously,the number of instructions executed per iteration is smaller(four versus five). But the other reason comes from the factthat modern processors “pre-fetch” the following instructionwhile executing the current instruction. However, branches disrupt thisprocess since the location of the next instruction can't be knowncertainly. The second translation involves many fewer branchinstructions, so it will have fewer problems with pre-fetchinginstructions.
3. Memory
We've seen how to build assembly programs that perform basicnumerical computation. We'll now turn to examining how assembly programscan access memory.
3.1. Basic memory instructions
The ARM supports memory access via two instructions,
LDR
and STR
. The LDR
instructionloads data out of memory,and STR
stores data into memory.Each takes two arguments. The first argument is the dataregister: For an LDR
instruction, the loaded data isplaced into this register; for an STR
instruction, the data found in this register is stored into memory.The second argument indicates the register that contains thememory address being accessed; it will be written using theregister name enclosed in brackets.(In Section 3.2, we will see that there areother options for how this second argument can be written.)For an example of how these instructions work, let's suppose we wanta assembly program fragment that adds the integers in an array. Weimagine that
R0
holds the address of the first integer of the array, andR1
holds the number of integers in the array.addInts MOVR4, #0
addLoop LDRR2, [R0]
ADDR4, R4, R2
ADDR0, R0, #4
SUBSR1, R1, #1
BNE addLoop
In this fragment, we use
R4
to hold the sum of the integers so far.In the LDR
instruction, we look into R0
for a memory addressand load the data found at that address into R2
. We then add this valueinto R4
. Then, we move R0
so that it contains the memory address of thenext integer in the array; we increase R0
by four because each integerconsumes four bytes of memory. Finally, we decrement R1
, which is thenumber of integers left to read from the array, and we repeat theprocess if there are integers remaining.Both
LDR
and STR
load and store 32-bit values.There are also instructions for working with 8-bit values, LDRB
and STRB
; these are useful primarily for working with strings.Below is an implementation of C's strcpy
function; we imaginethat R0
holds the address of the first character of the destinationarray, and that R1
holds the address of the first character of thesource string. We want to keep copying until we copy the terminating NULcharacter (ASCII 0).strcpy LDRBR2, [R1]
STRBR2, [R0]
ADDR0, R0, #1
ADDR1, R1, #1
TSTR2, R2; repeat if R2 is nonzero
BNE strcpy
3.2. Addressing modes
In the previous section's examples,we provided the address by enclosing a register's name inbrackets. But the ARM allows several other ways of indicating thememory address, too. Each such technique is called anaddressing mode; the technique of simply naming aregister holding a memory address is one such addressing mode,called register addressing, but there are others.
One of these others is scaled register offset, where weinclude in the brackets a register, another register, and a shift value.To compute the memory address to access, the processor takes the firstregister, and adds to it the second register shifted according to theshift value. (Neither of the registers mentioned in brackets changevalues.) This addressing mode is useful when accessing an array whereyou know the array index. We can modify our earlier routine for addingthe integers in an array to take advantage of this addressing mode.
addInts MOVR4, #0
addLoop SUBSR1, R1, #1
LDRR2, [R0, R1, LSL #2]
ADDR4, R4, R2
BNE addLoop
With each iteration of the loop, we first decrement our loop index
R1
. Then we retrieve the element at that entry of the array using ascaled register offset: We use R0
as our base, and we addto it R1
shifted left two places. We shift R1
left two places sothat R1
ismultiplied by four; after all, each integer in the array is four byteslong. After adding the loaded value into R4
, which accumulates thetotal, we repeat the loop if R1
hasn't reached 0 yet.Beyond using a different addressing mode, this version of the codeis slightly different from our original implementation in three ways.First, it loads the numbers in the array in reverseorder — that is, it loads the last number in the array first.Second,
R0
remains unaltered in the course of the fragment.And finally, it will be somewhat faster since it has one lessinstruction per loop iteration.Immediate post-indexed addressing is another addressing mode.To indicate this mode in assembly language, we follow thebrackets with a comma and a positive or negative immediate. In executingthe instruction, the processor still accesses the memory address foundin the register, but after accessing the memory the address registeris increased or decreased according to the immediate.
Our
strcpy
implementation is a useful example where immediatepost-indexed addressing is useful: After we store to R0
, wewant R0
to increase by 1 for the following iteration;and similarly, after we load from R1
, wewant R1
to increase by 1. We can use immediate post-indexedaddressing to avoid the two ADD
instructions of our earlierversion.strcpy LDRBR2, [R1], #1
STRBR2, [R0], #1
TSTR2, R2; repeat if R2 is nonzero
BNE strcpy
In total, the ARM processor supports ten addressing modes.
[Rn, #±imm] | Immediate offset Address accessed is imm more/less than the address found in Rn. Rn does not change. |
[Rn] | Register Address accessed is value found in Rn. This is just shorthand for [Rn, #0]. |
[Rn, ±Rm, shift] | Scaled register offset Address accessed is sum/difference of the value in Rn and the value in Rm shifted as specified. Rn and Rm do not change values. |
[Rn, ±Rm] | Register offset Address accessed is sum/difference of the value in Rn and the value in Rm. Rn and Rm do not change values. This is just shorthand for [Rn, ±Rm, LSL #0]. |
[Rn, #±imm]! | Immediate pre-indexed Address accessed is as with immediate offset mode, but Rn's value updates to become the address accessed. |
[Rn, ±Rm, shift]! | Scaled register pre-indexed Address accessed is as with scaled register offset mode, but Rn's value updates to become the address accessed. |
[Rn, ±Rm]! | Register pre-indexed Address accessed is as with register offset mode, but Rn's value updates to become the address accessed. |
[Rn], #±imm | Immediate post-indexed Address accessed is value found in Rn, and then Rn's value is increased/decreased by imm. |
[Rn], ±Rm, shift | Scaled register post-indexed Address accessed is value found in Rn, and then Rn's value is increased/decreased by Rm shifted according to shift. |
[Rn], ±Rm | Register post-indexed Address accessed is value found in Rn, and then Rn's value is increased/decreased by Rm. This is just shorthand for [Rn], ±Rm, LSL #0. |
For those addressing modes involving a shift, the shifttechnique is as with the arithmetic instructions(LSL, LSR, ASR, ROR, RRX).But the shift distance cannot be according to a register: The distancemust be an immediate.
3.3. Initializing memory
We often want to reserve memory for holding data in a program. To dothis, we use directives: directions for theassembler to do something other than simply translate anassembly language instruction into its corresponding machine code.One useful directiveis
DCD
, which inserts one or more 32-bit numerical valuesinto the machine code output.(DCD
cryptically stands forDefine Constant Double-words.)primes DCD2, 3, 5, 7, 11, 13, 17, 19
In this example, we've created the label
primes
, which willcorrespond to the address where 2 is placed into memory.In the following four bytes is placedthe integer 3, then 5, and so on.In our program, we would want to loadthe address of the array into a register; to do this, we add
primes
into the program counter PC
(which is synonymous withR15
). The below fragment loads the fifth prime (11) intoR1
.ADDR0, PC, #primes; load address of primes[0] into R0
LDRR1, [R0, #16] ; load primes[4] into R1
Arm Assembler For Macos Mac
Another directive worth mentioning is
DCB
, forloading bytes into memory. Thus, we could write the following.primes DCB2, 3, 5, 7, 11, 13, 17, 19
However, we are using just one byte for each number, so we can onlyinclude numbers between −128 and 127. We can also include a stringin the list; each character of the string will occupy one byte ofmemory.
greet DCB'hello worldn', 0
Notice how we included 0 after the string. Without this, the stringwon't be terminated by the NUL character.
One more directive worth noting here is the percent sign
%
.This is useful when you wish you reserve a block of memory, but youdon't care about the memory's initial value.array %120; reserve 120 bytes of memory, which can hold 30 ints
3.4. Multiple-register memory instructions
The ARM ISA also includes instructions allowing several values to beloaded or stored in the same instruction. The
LDMIA
instructionis one such instruction: It allows loading into multiple registers startingat an address named in another register. In the below example ofits usage, we take our code for adding the integers of an array,and we modify it using LDMIA
so that itprocesses four integers with each iteration of the loop. This strategyallows the program to run using fewer instructions, at the expense of morecomplexity.; R0 holds address of first integer in array
; R1 holds array's length; fragment works only if length is multiple of 4
addInts MOVR4, #0
addLoop LDMIAR0!, { R5-R8 }
ADDR5, R5, R6
ADDR7, R7, R8
ADDR4, R4, R5
ADDR4, R4, R7
SUBSR1, R1, #4
BNE addLoop
In executing the
LDMIA
instruction above, the ARM processor looksinto the R0
register for an address.It loads into R5
the four bytes starting at that address,into R6
the next four bytes,into R7
the next four bytes,and into R8
the next four bytes. Meanwhile, R0
is stepped forward by 16bytes, so with the next iteration the LDMIA
instruction will loadthe next four words into the registers.Inside the braces can be any list of registers, using dashes to indicateranges of registers, and using commas to separate ranges.Thus, the instruction
LDMIAR0!, { R1-R4, R8, R11-R12 }
will loadseven words from memory. The order in which the registers are listed is notsignificant; even if we write LDMIAR0!, { R11-R12, R8, R1-R4 }
,R1
will receive the first word loaded from memory.The exclamation point following
R0
in our example may beomitted; if omitted, then the address register is not altered by theinstruction. That is, R0
would continue pointing to the first integerin the array. In our example above, we want R0
to change so that itis pointing to the next block of four integers for the nextiteration, so we included the exclamation point.Another instruction is
STMIA
, which stores several registers intomemory. In the following example, we shift every number in an array intothe next spot; thus, the array <2,3,5,7> becomes<0,2,3,5>.; R0 holds address of first integer in array
; R1 holds array's length; fragment works only if length is multiple of 4
shift MOVR4, #0
shLoop LDMIAR0, { R5-R8 }
STMIAR0!, { R4-R7 }
MOVR4, R8
SUBSR1, R1, #4
BNE shLoop
Notice how the
LDMIA
instruction omits the exclamation pointso that R0
isn't modified. This is so that STMIA
stores intothe same range of addresses that were just loaded into the registers.The STMIA
instruction has the exclamation point becauseR0
mustbe modified in preparation for the next iteration of the loop.The ARM processor includes four variants of the multiple-load andmultiple-store instructions; the
LDM
and STM
abbreviations must always indicate one of these four variants.LDMIA , STMIA | Increment after We start loading from the named address and into increasing addresses. |
LDMIB , STMIB | Increment before We start loading from four more than the named address and into increasing addresses. |
LDMDA , STMDA | Decrement after We start loading from the named address and into decreasing addresses. |
LDMDB , STMDB | Decrement before We start loading from four less than the named address and into decreasing addresses. |
Across all four modes, the highest-numbered register alwayscorresponds to the highest address in memory. Thus, the instruction
LDMDAR0, { R1-R4 }
will place R4
into theaddress named by R0
,R3
into R0
− 4, and so on.Arm Assembler For Macos Pc
As we'll see in studying subroutines, the differentvariants are particularly useful when we want to use a blockof unused memory as a stack.