Arm Assembler For Macos

Arm Assembler For Macos Windows 7
Arm Assembler For Macos Mac
Arm Assembler For Macos Pc

Jan 31, 2016 Arm-specific assembler directives. The.syntax directive allows you to set the instruction set syntax. I recommend setting the syntax to unified as the first thing in your sources (right after your.include files).syntax unified-This will make sure that the GNU assembler is using a modern syntax for Arm THUMB instructions.

ARM assembly on macOS. We've been doing ARM assembly in class and I've been loving it. My only gripe is that I have to use ARMSim on my Bootcamp partition to run Assembly code. Sep 11, 2013 Once you move beyond short sequences of optimised Arm assembler, the next likely step will be to managing more complex, optimised routines using macros and functions. Macros are good for short repeated sequences, but often quickly increase the size of your code. I recently decided that I wanted to try learning how to program in assembly. Having a 2011 model MBP, I was not able to find much info on how to write and execute assembly code on my computer.

by Carl Burch, Hendrix College, October 2011

Introducing ARM assembly language by Carl Burch is licensed under a CreativeCommons Attribution-Share Alike 3.0 United StatesLicense.
Based on a work atwww.toves.org/books/arm/.

1. Background
1.1. Definitions
1.2. ISA varieties
2. ARM assembly basics
2.1. A simple program: Adding numbers
2.2. Another example: Hailstone sequence
2.3. Another example: Adding digits
2.4. Summary of instructions so far
2.5. Condition codes
3. Memory
3.1. Basic memory instructions
3.2. Addressing modes
3.3. Initializing memory
3.4. Multiple-register memory instructions

In this document, we study assembly language,the system for expressing the individual instructions that a computershould perform.

1. Background

We are actually concerned with two types of languages, assemblylanguages and machine languages.

1.1. Definitions

A machine language encodes instructions as sequences of0's and 1's; this binary encoding is what the computer's processor isbuilt to execute. Writing programs using this encoding is unwieldy forhuman programmers, though. Thus, when programmers want to dictatethe precise instructions that the computer is to perform, they usean assembly language, which allows instructions to be writtenin textual form. An assembler translates a filecontaining assembly language code into the corresponding machinelanguage.

Let's look at a simple example for ARM's design. Here is amachine language instruction:

1110 0001 1010 0000 0011 0000 0000 1001

When the processor is told to execute that binary sequence, itcopies the value from “register 9” into “register 3.”But as a programmer, you'd hardly want to read a long binarysequence and make sense of it. Instead, a programmer wouldprefer programming in assembly language, where we would expressthis using the following line.

MOVR3, R9

Then the programmer would use an assembler to translate thisinto the binary encoding that the computer actually executes.

But there is not just one machine language:A different machine language is designed for each line of processors,designed with an eye to provide a powerful set of fast instructionswhile allowing a relatively simple circuit to be built.Often processors are designed to be compatible with a previousprocessor, so it follows the same machine language design.For example, Intel's line of processors (including 80386, Pentium,and Core i7) support similar machine languages.But ARM processors support an entirely different machinelanguage. The design of the machine language encoding is calledthe instruction set architecture (ISA).

And for each machine language, there must be a differentassembly language, since the assembly language must correspond to anentirely different set of machine language instructions.

1.2. ISA varieties

Of the many ISAs (instruction set architectures), x86 is handilythe most widely recognized. It was first designed by Intel in 1974 infor an 8-bit processor (the Intel 8080), and over the years it wasextended to 16-bit form (1978, Intel 8086), then to 32-bit form(1985, Intel 80386), and then to 64-bit form (2003, AMD Opteron).Today, processors supporting IA32 are now manufactured by Intel, AMD,and VIA, and they can be found in most personal computers.

Another well-known ISA today is the PowerPC.Apple's Macintosh computers used these processors until 2006,when Apple switched their computers to the x86 line of processors.But PowerPC remains in common use for applications such as automobilesand gaming consoles (including the Wii, Playstation 3, andXBox 360).

But the ISA that we'll study comes from a company called ARM.(Like other successful ISAs, ARM's ISA has grown over the years. We'llexamine version 4T.)Processors supporting ARM's ISA are distributed quite widely,usually for low-power devices such as cellphones, digital music players,and handheld game systems. The iPhone, Kindle and Nintendo DS are allprominent examples of devices that incorporate an ARM processor.

There are several reasons for examining ARM's ISA rather thanIA32.

Arm Assembler For Macos Windows 7

Assembly language programming is rarely used for more powerfulcomputing systems, since it's far easier to program in ahigh-level programming language. But for small devices, assemblylanguage programming remains important: Due to power and priceconstraints, the devices have very few resources, and developerscan use assembly language to use these resources as efficiently aspossible.
The multiple extensions to the IA32 architecture lead it to befar too complicated for us to really understand thoroughly.
IA32 dates from the 1970's, which was a completely different erain computing. ARM is more representative of more modern ISAdesigns.

2. ARM assembly basics

We'll now turn to examining ARM's ISA.

2.1. A simple program: Adding numbers

Let's start our introduction using a simple example.Imagine that we want to add the numbersfrom 1 to 10. We might do this in C as follows.

inttotal; inti; total = 0; for (i = 10; i > 0; i--) { total += i; }

The following translates this into the instructions supported byARM's ISA.

MOVR0, #0; R0 accumulates total MOVR1, #10; R1 counts from 10 down to 1 again ADDR0, R0, R1 SUBSR1, R1, #1 BNE again halt B halt ; infinite loop to stop computation

You'll notice the mentions of R0 and R1 in the assemblylanguageprogram. These are references to registers, which areplaces in a processor for storing data during computation. The ARMprocessor includes 16 easily accessible registers, numbered R0throughR15. Each stores a single 32-bit number. Note that though registersstore data, they are very separate from the notion of memory:Memory is typically much larger (kilobytes or oftengigabytes), and so it typically exists outside of the processor.Because of memory's size, accessing memory takes more time thanaccessing registers — typically about 10 times as long.Thus, assembly language programmingtends to focus on using registers when possible.

Because each line of an assembly language program correspondsdirectly to machine language, the lines are highly restricted in theirformat. You can see that each line consists of two parts:First is the opcode such as MOV that is an abbreviationindicating the type of operation; and after it comesarguments such as “R0, #0”.Each opcode has strict requirements on the allowed arguments. Forexample, a MOV instruction must have exactly twoarguments: the first must identify a register, and the secondmust provide either a register or a constant(prefixed by a ‘#’).A constant placed directly in aninstruction is called an immediate, since it isimmediately available to the processor when reading theinstruction.

In the above assembly language program, we first use theMOV instruction to initialize R0 at 0 and R1 at 10.The ADD instruction computes the sum of R0 andR1 (thesecond and third arguments) and places the result into R0 (the firstargument); this corresponds to the total += i; lineof the equivalent C program.The subsequent SUBS instruction decreases R1 by 1.

To understand the next instruction, we need to understand that inaddition to the registers R0 through R15, the ARM processor alsoincorporates a set of four “flags,” labeled the zero flag (Z), thenegative flag (N), the carry flag (C), and the overflow flag (V).Whenever an arithmetic instruction has an S at its end, asSUBS does, theseflags will be updated based on the result of the computation.In this case, if the result of decreasing R1 by 1 results in 0, the Zflag will become 1; the N, C, and V flags are also updated, but they'renot pertinent to our discussion of this code.

The following instruction, BNE, will check the Z flag.If the Z flag is not set (i.e., the previous subtraction gives a nonzeroresult), then BNE arranges the processor so that the nextinstruction executed is the ADD instruction, labeledagain; this leads to repeating the loop with a smallervalue of R1. If the Z flag is set, the processor willsimply continue on to the next instruction.(BNE stands for Branch if Not Equal.The name comes from imagining that we want to check whether two numbersare equal. One way to do this using ARM's ISA would be to firsttell the processor to subtract the two numbers; if the difference iszero, then the two numbers must be equal, and the zero flag will be 1.them results in zero, which would set the zero flag.)

The final instruction, B, always branches back tothe named instruction. In this program, the instruction names itself,effectively halting the program by putting the computer into a tightinfinite loop.

2.2. Another example: Hailstone sequence

Now, let's consider the hailstone sequence.Given an integer n, we repeatedly want to apply the followingprocedure.

iters ← 0
whilen ≠ 1:
iters ← iters + 1
ifn is odd:
n ← 3 ⋅ n + 1
else:
n ← n / 2

For example, if we start with 3, then since this is odd ournext number is 3 ⋅ 3 + 1 = 10.This is even, so our next number is 10 / 2 = 5.This is odd, so our next number is 3 ⋅ 5 + 1 = 16.This is even, so we then go to 8, which is still even, so we goto 4, then 2, and 1.

In translating this to ARM's assembly language,we must confront the fact that ARM lacks any instructionsrelated to division.(Designers felt division too rarely necessary to merit wasting transistorson the complex circuit that it requires.)Fortunately, the division in this algorithm is relativelysimple: We merely divide n by 2, which can be done with aright shift.

ARM has an unusual approach to shifting:We have already seen that every basic arithmetic instruction,the final argument can bea constant (as in SUBSR1, R1, #1)or a register (as in ADDR0, R0, R1).But when the final argument is a register, we can optionally adda shift distance:For instance, the instruction“ADDR0, R0, R1, LSL #1”.says to add a left-shifted version of R1 before addingit to R0 (while R1 itself remains unchanged).The ARM instruction set supports four types of shifting:

`LSL`	logical shift left
`LSR`	logical shift right
`ASR`	arithmetic shift right
`ROR`	rotate right

The shift distance can be an immediate between 1 and 32,or it can be based on a register value:“MOVR0, R1, ASR R2”is equivalent to “R0 = R1 >> R2”.

In translating our pseudocode to assembly language,we'll find the shift operations usefulboth for multipling n by 3(computed as n + (n « 1))and for dividing n by 2(computed as n » 1).We'll also need to deal with testing whetherwhether n is odd.We can do this by testing whether n's 1's bit is set,which we can accomplish usingthe ANDS instruction to perform a bitwise AND with 1.The ANDS instruction sets the Z flag based onwhether the result is 0.If the result is 0, then this means that the 1's bit of n is 0,and so n is even.

MOVR0, #5; R0 is current number MOVR1, #0; R1 is count of number of iterations again ADDR1, R1, #1; increment number of iterations ANDSR0, R0, #1; test whether R0 is odd BEQ even ADDR0, R0, R0, LSL #1; if odd, set R0 = R0 + (R0 << 1) + 1 ADDR0, R0, #1; and repeat (guaranteed R0 > 1) B again even MOVR0, R0, ASR #1; if even, set R0 = R0 >> 1 SUBSR7, R0, #1; and repeat if R0 != 1 BNE again halt B halt ; infinite loop to stop computation

2.3. Another example: Adding digits

Let's look at another example. Here, suppose that we want to add thedigits of a positive number; for example, given the number 1,024, wewould want to compute 1 + 0 + 2 + 4, whichis 7. The obvious way to express this in C is as follows.

total = 0; while (i > 0) { total += i % 10; i /= 10; }

It's difficult to translate this into ARM's ISA, though, since theARM lacks any instruction for dividing values. However, we canuse a clever trick to perform this division usingmultiplication: If we take a number and multiply by2³² / 10, the upper 32 bits of the product tell usthe result ofdividing the original number by 10. This insight leads to the followingalternative way of summing the digits in a number.

base = 0x1999999A; total = 0; while (i > 0) { iDiv10 = (i * base) >> 32; total += i - iDiv10 * 10; i = iDiv10; }

In translating this into assembly code, we have to confront twoissues. The more obvious is determining which instruction to use toperform the multiplication. Here, we want to use the UMULLinstruction (Unsigned MULtiply Long), whichinterprets two registers as unsigned 32-bit numbers,and places the 64-bit product of the registers' values into twodifferent registers. The below example illustrates.

UMULLR4, R5, R0, R2; computes R0 * R2, placing lower 32 bits in R4, upper 32 in R5

The less obvious issue we have to confront is that of placing0x1999999A into a register. You might be tempted at first to useMOV, but this instruction has a major limitation:Any immediate valuemust be rotated by an even number of places to reach an eight-bit value.For numbers between 0 and 255, this is not a problem; nor it is aproblem for 1,024, since 0x400 can be achieved by rotating 1 left 12places. But there's no way to do this for 0x1999999A. The solution we'lluse is to load each byte separately, joining them using theORR instruction, which computes the bitwise OR of twovalues.

MOVR0, #1024; R0 is input, decreases by factors of 10 MOVR1, #0; R1 is sum of digits MOVR2, #0x19000000; R2 is constantly 0x1999999A ORRR2, R2, #0x00990000 ORRR2, R2, #0x00009900 ORRR2, R2, #0x0000009A MOVR3, #10; R3 is constantly 10 loop UMULLR4, R5, R0, R2; R5 is R0 / 10 UMULLR4, R6, R5, R3; R4 is now 10 * (R0 / 10) SUBR4, R0, R4; R5 is now one's digit of R0 ADDR1, R1, R4; add it into R1 MOVSR0, R5 BNE loop halt B halt

By the way, you may sometimes want to place a small negative numberlike −10 into a register. You can't use MOV toaccomplish this, because its two's-complement representation is0xFFFFFFF6, which can't be rotated into an 8-bit number. If it happensthat to know that some register holds the number 0, then you could useSUB. But if it doesn't, then the MVN(MoVe Not) instruction is useful: It places thebitwise NOT of its argument into the destination register. So to get−10 into R0, we can use“MVNR0, #0x9”.

2.4. Summary of instructions so far

The ARM includes sixteen “basic” arithmetic instructions,numbered 0 through 15.All sixteen arelisted below, with the functionality summarized by the relevantC operator. (The number at the beginning of each line is usedin translating the instructions into machine language. There'sno reason for programmers to memorize this correspondence,though: After all, this is why we have assemblers.)

Figure 1: ARM's basic arithmetic instructions

0.	`ANDregd`, `rega`, `argb`	`regd` ← `rega` & `argb`
1.	`EORregd`, `rega`, `argb`	`regd` ← `rega` ^ `argb`
2.	`SUBregd`, `rega`, `argb`	`regd` ← `rega` − `argb`
3.	`RSBregd`, `rega`, `argb`	`regd` ← `argb-rega`
4.	`ADDregd`, `rega`, `argb`	`regd` ← `rega` + `argb`
5.	`ADCregd`, `rega`, `argb`	`regd` ← `rega` + `argb` + `carry`
6.	`SBCregd`, `rega`, `argb`	`regd` ← `rega` − `argb` − !`carry`
7.	`RSCregd`, `rega`, `argb`	`regd` ← `argb` − `rega` − !`carry`
8.	`TSTrega`, `argb`	set flags for `rega` & `argb`
9.	`TEQrega`, `argb`	set flags for `rega` ^ `argb`
10.	`CMPrega`, `argb`	set flags for `rega` − `argb`
11.	`CMNrega`, `argb`	set flags for `rega` + `argb`
12.	`ORRregd`, `rega`, `argb`	`regd` ← `rega` \| `argb`
13.	`MOVregd`, `arg`	`regd` ← `arg`
14.	`BICregd`, `rega`, `argb`	`regd` ← `rega` & ~`argb`
15.	`MVNregd`, `arg`	`regd` ← ~`argb`

Except for TST, TEQ, CMP, andCMN, all instructions may have an S postfixed tothe opcode to signify that the operation should set the flags. ForTST, TEQ, CMP, andCMN, the S is implicit: The instructions don'tchange any general-purpose registers, so the only point in performingthe instruction is to set the flags.

We've also seen three other opcodes that aren't in the aboveof basic arithmetic instructions:UMULL is a “non-basic” arithmetic instruction,and B and BNE aren't arithmetic instructions.

2.5. Condition codes

Each ARM instruction may incorporate a conditioncode specifying that the operation should take place onlywhen certain combinations of the flags hold. You can specify thecondition code by including it as part of the opcode.It usually comes at the end of the opcode, but it precedes theoptional S on the basic arithmetic instructions.The name for the condition codes is based onthe supposition that the flags were set based on a CMP orSUBS instruction.

Figure 2: ARM's condition codes

0.	`EQ`	equal	`Z`
1.	`NE`	not equal	`!Z`
2.	`CS` or `HS`	carry set / unsigned higher or same	`C`
3.	`CC` or `LO`	carry clear / unsigned lower	`!C`
4.	`MI`	minus / negative	`N`
5.	`PL`	plus / positive or zero	`!N`
6.	`VS`	overflow set	`V`
7.	`VC`	overflow clear	`!V`
8.	`HI`	unsigned higher	`C && !Z`
9.	`LS`	unsigned lower or same	`!C \|\| Z`
10.	`GE`	signed greater than or equal	`N V`
11.	`LT`	signed less than	`N != V`
12.	`GT`	signed greater than	`!Z && (N V)`
13.	`LE`	signed greater than or equal	`Z \|\| (N != V)`
14.	`AL` or omitted	always	`true`

The only instance of this condition code we have seen so faris the BNE instruction: In this case, we have a Binstruction for branching, but the branch only takes place ifthe Z flag is 0.

But ARM's ISA allows us to apply condition codes to other opcodes,too. For example, ADDEQ says to perform an addition ifthe Z flag is 1. One common scenario using condition codes onnon-branch instructions is in computing the greatest common divisorof two numbers using Euclid's GCD algorithm.

a = 40; b = 25; while (a != b) { if (a > b) a -= b; elseb -= a; }

The traditional translation to assembly languagewould use condition codes only on branch instructions.

MOVR0, #40; R0 is a MOVR1, #25; R1 is b again CMPR0, R1 BEQ halt BLT isLess SUBR0, R0, R1 B again isLess SUBR1, R1, R0 B again halt B halt

However, the following is a much shorter and more efficienttranslation.

MOVR0, #40; R0 is a MOVR1, #25; R1 is b again CMPR0, R1 SUBGTR0, R0, R1 SUBLTR1, R1, R0 BNE again halt B halt

This is more efficient for two reasons. More obviously,the number of instructions executed per iteration is smaller(four versus five). But the other reason comes from the factthat modern processors “pre-fetch” the following instructionwhile executing the current instruction. However, branches disrupt thisprocess since the location of the next instruction can't be knowncertainly. The second translation involves many fewer branchinstructions, so it will have fewer problems with pre-fetchinginstructions.

3. Memory

We've seen how to build assembly programs that perform basicnumerical computation. We'll now turn to examining how assembly programscan access memory.

3.1. Basic memory instructions

The ARM supports memory access via two instructions,LDR and STR. The LDR instructionloads data out of memory,and STR stores data into memory.Each takes two arguments. The first argument is the dataregister: For an LDR instruction, the loaded data isplaced into this register; for an STRinstruction, the data found in this register is stored into memory.The second argument indicates the register that contains thememory address being accessed; it will be written using theregister name enclosed in brackets.(In Section 3.2, we will see that there areother options for how this second argument can be written.)

For an example of how these instructions work, let's suppose we wanta assembly program fragment that adds the integers in an array. Weimagine that R0 holds the address of the first integer of the array, andR1 holds the number of integers in the array.

addInts MOVR4, #0 addLoop LDRR2, [R0] ADDR4, R4, R2 ADDR0, R0, #4 SUBSR1, R1, #1 BNE addLoop

In this fragment, we use R4 to hold the sum of the integers so far.In the LDR instruction, we look into R0 for a memory addressand load the data found at that address into R2. We then add this valueinto R4. Then, we move R0 so that it contains the memory address of thenext integer in the array; we increase R0 by four because each integerconsumes four bytes of memory. Finally, we decrement R1, which is thenumber of integers left to read from the array, and we repeat theprocess if there are integers remaining.

Both LDR and STR load and store 32-bit values.There are also instructions for working with 8-bit values, LDRBand STRB; these are useful primarily for working with strings.Below is an implementation of C's strcpy function; we imaginethat R0 holds the address of the first character of the destinationarray, and that R1 holds the address of the first character of thesource string. We want to keep copying until we copy the terminating NULcharacter (ASCII 0).

strcpy LDRBR2, [R1] STRBR2, [R0] ADDR0, R0, #1 ADDR1, R1, #1 TSTR2, R2; repeat if R2 is nonzero BNE strcpy

3.2. Addressing modes

In the previous section's examples,we provided the address by enclosing a register's name inbrackets. But the ARM allows several other ways of indicating thememory address, too. Each such technique is called anaddressing mode; the technique of simply naming aregister holding a memory address is one such addressing mode,called register addressing, but there are others.

One of these others is scaled register offset, where weinclude in the brackets a register, another register, and a shift value.To compute the memory address to access, the processor takes the firstregister, and adds to it the second register shifted according to theshift value. (Neither of the registers mentioned in brackets changevalues.) This addressing mode is useful when accessing an array whereyou know the array index. We can modify our earlier routine for addingthe integers in an array to take advantage of this addressing mode.

addInts MOVR4, #0 addLoop SUBSR1, R1, #1 LDRR2, [R0, R1, LSL #2] ADDR4, R4, R2 BNE addLoop

With each iteration of the loop, we first decrement our loop indexR1. Then we retrieve the element at that entry of the array using ascaled register offset: We use R0 as our base, and we addto it R1shifted left two places. We shift R1 left two places sothat R1 ismultiplied by four; after all, each integer in the array is four byteslong. After adding the loaded value into R4, which accumulates thetotal, we repeat the loop if R1 hasn't reached 0 yet.

Beyond using a different addressing mode, this version of the codeis slightly different from our original implementation in three ways.First, it loads the numbers in the array in reverseorder — that is, it loads the last number in the array first.Second, R0 remains unaltered in the course of the fragment.And finally, it will be somewhat faster since it has one lessinstruction per loop iteration.

Immediate post-indexed addressing is another addressing mode.To indicate this mode in assembly language, we follow thebrackets with a comma and a positive or negative immediate. In executingthe instruction, the processor still accesses the memory address foundin the register, but after accessing the memory the address registeris increased or decreased according to the immediate.

Our strcpy implementation is a useful example where immediatepost-indexed addressing is useful: After we store to R0, wewant R0 to increase by 1 for the following iteration;and similarly, after we load from R1, wewant R1 to increase by 1. We can use immediate post-indexedaddressing to avoid the two ADD instructions of our earlierversion.

strcpy LDRBR2, [R1], #1 STRBR2, [R0], #1 TSTR2, R2; repeat if R2 is nonzero BNE strcpy

In total, the ARM processor supports ten addressing modes.

`[Rn, #±imm]`	Immediate offset Address accessed is `imm` more/less than the address found in R`n`. R`n` does not change.
`[Rn]`	Register Address accessed is value found in R`n`. This is just shorthand for `[Rn, #0]`.
`[Rn, ±Rm, shift]`	Scaled register offset Address accessed is sum/difference of the value in R`n` and the value in R`m` shifted as specified. R`n` and R`m` do not change values.
`[Rn, ±Rm]`	Register offset Address accessed is sum/difference of the value in R`n` and the value in R`m`. R`n` and R`m` do not change values. This is just shorthand for `[Rn, ±Rm, LSL #0]`.
`[Rn, #±imm]!`	Immediate pre-indexed Address accessed is as with immediate offset mode, but R`n`'s value updates to become the address accessed.
`[Rn, ±Rm, shift]!`	Scaled register pre-indexed Address accessed is as with scaled register offset mode, but R`n`'s value updates to become the address accessed.
`[Rn, ±Rm]!`	Register pre-indexed Address accessed is as with register offset mode, but R`n`'s value updates to become the address accessed.
`[Rn], #±imm`	Immediate post-indexed Address accessed is value found in R`n`, and then R`n`'s value is increased/decreased by `imm`.
`[Rn], ±Rm, shift`	Scaled register post-indexed Address accessed is value found in R`n`, and then R`n`'s value is increased/decreased by R`m` shifted according to `shift`.
`[Rn], ±Rm`	Register post-indexed Address accessed is value found in R`n`, and then R`n`'s value is increased/decreased by R`m`. This is just shorthand for `[Rn], ±Rm, LSL #0`.

For those addressing modes involving a shift, the shifttechnique is as with the arithmetic instructions(LSL, LSR, ASR, ROR, RRX).But the shift distance cannot be according to a register: The distancemust be an immediate.

3.3. Initializing memory

We often want to reserve memory for holding data in a program. To dothis, we use directives: directions for theassembler to do something other than simply translate anassembly language instruction into its corresponding machine code.One useful directiveis DCD, which inserts one or more 32-bit numerical valuesinto the machine code output.(DCD cryptically stands forDefine Constant Double-words.)

primes DCD2, 3, 5, 7, 11, 13, 17, 19

In this example, we've created the label primes, which willcorrespond to the address where 2 is placed into memory.In the following four bytes is placedthe integer 3, then 5, and so on.

In our program, we would want to loadthe address of the array into a register; to do this, we addprimes into the program counter PC (which is synonymous withR15). The below fragment loads the fifth prime (11) intoR1.

ADDR0, PC, #primes; load address of primes[0] into R0 LDRR1, [R0, #16] ; load primes[4] into R1

Arm Assembler For Macos Mac

Another directive worth mentioning is DCB, forloading bytes into memory. Thus, we could write the following.

primes DCB2, 3, 5, 7, 11, 13, 17, 19

However, we are using just one byte for each number, so we can onlyinclude numbers between −128 and 127. We can also include a stringin the list; each character of the string will occupy one byte ofmemory.

greet DCB'hello worldn', 0

Notice how we included 0 after the string. Without this, the stringwon't be terminated by the NUL character.

One more directive worth noting here is the percent sign %.This is useful when you wish you reserve a block of memory, but youdon't care about the memory's initial value.

array %120; reserve 120 bytes of memory, which can hold 30 ints

3.4. Multiple-register memory instructions

The ARM ISA also includes instructions allowing several values to beloaded or stored in the same instruction. The LDMIA instructionis one such instruction: It allows loading into multiple registers startingat an address named in another register. In the below example ofits usage, we take our code for adding the integers of an array,and we modify it using LDMIA so that itprocesses four integers with each iteration of the loop. This strategyallows the program to run using fewer instructions, at the expense of morecomplexity.

; R0 holds address of first integer in array ; R1 holds array's length; fragment works only if length is multiple of 4 addInts MOVR4, #0 addLoop LDMIAR0!, { R5-R8 } ADDR5, R5, R6 ADDR7, R7, R8 ADDR4, R4, R5 ADDR4, R4, R7 SUBSR1, R1, #4 BNE addLoop

In executing the LDMIA instruction above, the ARM processor looksinto the R0 register for an address.It loads into R5 the four bytes starting at that address,into R6 the next four bytes,into R7 the next four bytes,and into R8 the next four bytes. Meanwhile, R0 is stepped forward by 16bytes, so with the next iteration the LDMIA instruction will loadthe next four words into the registers.

Inside the braces can be any list of registers, using dashes to indicateranges of registers, and using commas to separate ranges.Thus, the instruction LDMIAR0!, { R1-R4, R8, R11-R12 } will loadseven words from memory. The order in which the registers are listed is notsignificant; even if we write LDMIAR0!, { R11-R12, R8, R1-R4 },R1 will receive the first word loaded from memory.

The exclamation point following R0 in our example may beomitted; if omitted, then the address register is not altered by theinstruction. That is, R0 would continue pointing to the first integerin the array. In our example above, we want R0 to change so that itis pointing to the next block of four integers for the nextiteration, so we included the exclamation point.

Another instruction is STMIA, which stores several registers intomemory. In the following example, we shift every number in an array intothe next spot; thus, the array <2,3,5,7> becomes<0,2,3,5>.

; R0 holds address of first integer in array ; R1 holds array's length; fragment works only if length is multiple of 4 shift MOVR4, #0 shLoop LDMIAR0, { R5-R8 } STMIAR0!, { R4-R7 } MOVR4, R8 SUBSR1, R1, #4 BNE shLoop

Notice how the LDMIA instruction omits the exclamation pointso that R0 isn't modified. This is so that STMIA stores intothe same range of addresses that were just loaded into the registers.The STMIA instruction has the exclamation point becauseR0 mustbe modified in preparation for the next iteration of the loop.

The ARM processor includes four variants of the multiple-load andmultiple-store instructions; the LDM and STMabbreviations must always indicate one of these four variants.

`LDMIA`, `STMIA`	Increment after We start loading from the named address and into increasing addresses.
`LDMIB`, `STMIB`	Increment before We start loading from four more than the named address and into increasing addresses.
`LDMDA`, `STMDA`	Decrement after We start loading from the named address and into decreasing addresses.
`LDMDB`, `STMDB`	Decrement before We start loading from four less than the named address and into decreasing addresses.

Across all four modes, the highest-numbered register alwayscorresponds to the highest address in memory. Thus, the instructionLDMDAR0, { R1-R4 } will place R4 into theaddress named by R0,R3 into R0 − 4, and so on.

Arm Assembler For Macos Pc

As we'll see in studying subroutines, the differentvariants are particularly useful when we want to use a blockof unused memory as a stack.

Contents