How to flash an LED

Today we are going to be learning how to flash an LED on a microcontroller by writing ARM assembly.

If you write software but are unfamiliar with basic electronics or embedded software development, there will be explanations of some fundamentals – I expect you will not feel left behind :).

If you'd like to skip the fundamentals and go straight to the assembly writing, click here.

We'll be using a 1bitsy, which is an STM32 development board. We'll be reading the STM32 reference manual to figure out what assembly to write. If you want to try out the coding yourself, but don't have a 1bitsy or anything similar, check out this Github repo where you can run the code on an emulator in a Docker container.

Some basics

Let's get some basics out of the way. How do you flash an LED?


LED circuit symbol As you may know, an LED is a "light-emitting diode". LEDs are increasingly popular in torches, bulbs and various other lighting due to their ability to be very bright with relatively low power. What we need to know is that they emit light when current passes through them. So: we need some current.

LED circuit This is just a simple electronics circuit with current flowing through the LED. We have low voltage at the bottom, and higher voltage at the top: that difference causes current to flow, lighting up our LED. But this is no good for us – we want to control the flow of current so that we can flash our LED on and off.

In order to control the current in our LED, we can connect it to a GPIO pin on our microcontroller: LED connected to a GPIO

GPIO just stands for "general purpose input/output". It's a pin on a microchip that you can configure at runtime: for example, you can say "I want this pin to be an output, and I want to turn it on", or "I want this pin to be an input" and then read data from it.

A microcontroller is a small computer used for embedded software – we'll learn more about the specific micrcontroller we're using later. Embedded software is software that isn't written for a general purpose computer, but instead targets specific hardware used in some physical device: for example, the software that runs on an MRI scanner to control its operation, or the software in modern cars that controls things like the anti-lock braking system.


If GPIO pins are configurable at runtime, then we need to write some code that will tell our little computer how to configure the GPIO. This is how that would usually look:

embedded software development process

Usually, you'd expect that code to be written in C. You need a language that allows you control over memory the way that C does: when you have small limited memory, as is generally the case on small embedded computers, it's important to be able to understand how much memory is being used by your program. Languages that rely on dynamic allocation and garbage collection are a bad fit, partly for this reason.

Rust and C++ are also used for embedded work. The C++ you'd write for embedded would be quite different from what you might write for a desktop application – you likely wouldn't use any of the STL containers as they all rely on dynamic memory allocation. Eliminating dynamic memory allocation is safer: the risk of memory allocation failing is much higher when there isn't much memory to begin with. And a failure could be much more catastrophic: many embedded systems are designed to run autonomously, without any human there to restart them. Many control safety-critical physical systems.

Dynamic allocation is generally not needed anyway: a desktop application might have to dynamically allocate resources to accommodate a user opening an unknown number of tabs in a GUI; an embedded application will know at compile time how many motors it has to control, or how many sensors it will read from. This allows a lot of memory to be allocated statically.

So, for the sake of example let's say that you would write your code to flash an LED in C. You'd probably use a hardware abstraction library (aka a HAL) to abstract over memory addresses and such. This makes the code more portable as well as more readable.

But today, we're going to do stuff a little differently from how you might normally: we'll be writing all our code in assembly.

What is assembly?

When you compile a C program, say, you compile it to machine code. Machine code is the lowest level of software – it's the binary code that the CPU executes. This machine code consists of instructions. For example, you might have one instruction that says "copy value 42 into register 0", and that is our smallest unit of executable code.

Assembly is the next level of software up – it's a lot like writing human readable machine code, where you write out each instruction in text form. This is very different to writing a C program which is much further abstracted, which means compiling C to machine code is a lot more complicated. When we write assembly today, that's exactly what our CPU is going to be executing: there's a very close mapping to the actual machine code.

Why write our code in assembly?

The usual reasons: for fun and learning! Writing code in assembly means really getting to know your target hardware. Plus, we'll know exactly what code is running on our processor.

Although this isn't something you might usually do, understanding assembly code is a big part of many developers' jobs: reading the assembly is often the only way to debug optimised code, and it's crucial to reverse engineering and exploit development. It's also key to compiler development, and used for making specific optimisations to embedded code. It's often the only way to access specialised CPU features, and to run special instructions like DSP instructions.

Getting to know our hardware

Doing embedded development means really getting to know your target hardware. So, what hardware are we using?


We have an ARM development board called a 1bitsy. It has an STM32F4 on it, which is our microcontroller unit, or MCU. This microcontroller is basically the CPU plus about a megabyte of flash and 200 kilobytes of RAM, and what are called peripherals: some of these are for communicating via various protocols, and some are for general purposes usage, like the GPIOs we talked about earlier. The MCU has everything you need to make the CPU actually be a useful computer. Our STM32 contains a Cortex M4 CPU – the picture above is of the die of the STM32, it's basically what's inside the black plastic on the outside of the chip. The CPU is on the top right of the die, with RAM top left, flash bottom left, and peripherals bottom right.

I've included lists of the documentation associated with these. Today we're exclusively going to be looking at the schematic for the board and the reference manual for the STM32.

To program the 1bitsy, we will also need a prorgrammer board like the Black Magic probe.

A brief introduction to assembly

What does assembly look like?

Before we get onto writing some code, what does ARM assembly look like?

Here is an example instruction: mov r0, #5. This means move the literal value 5 into register 0. But what's a register? A register is the last key concept we're going to need to know before we write any assembly.

registers of Cortex M4

Our ARM processor has a small number of very fast, very small storage locations, and they're called registers. These are directly accessed by the CPU, and they aren't accessible via main memory. Some are used for general purpose storage, others have specific purposes, like the program counter register (PC). The CPU is hardwired to execute whatever instruction is at the memory location stored in the PC. The stack pointer is used to keep track of the call stack.

On a separate memory bus, our STM32 also has about a thousand configuration and status registers – also often called memory-mapped IO. These are basically pre-defined structs that live somewhere in memory, and you read & write to them in order to configure the hardware. In our case, we'll be writing to these to configure a GPIO, which will be connected to our LED.


I think it's important context to note that the assembly we'll be writing today is a little different than what you would likely write for your PC. Broadly, you can divide computer architectures into complex instruction set computers (CISC) and reduced instruction set computers (RISC). CISC is what Intel chips use, and it is optimised to perform actions in as few instructions as possible – as a consequence each instruction itself can be very complex. RISC, on the other hand, prioritises having simple instructions, and you'll be glad to know that's what we'll be writing today.

I couldn't resist including a screenshot from Hackers, my favourite movie, which is from 1995, a much more hopeful time in software.


Here the hacker Acid Burn is saying that RISC architecture is going to change everything – and in many ways she's right! I don't know of any mobile phone, Apple or Android, that doesn't use an ARM core, and mobile phones are everywhere. Sadly, most laptops and desktops use Intel CISC processors. This makes no difference to my life at all, but I like to pretend it matters to me so I can feel like I'm as cool as Acid Burn.

Let's write some code!

At is time to get down to business. First we need to briefly look at the schematic for the 1bitsy, our development board. The schematic tells us what is on the board, and how it is connected. We're interested in how the status LED is connected.

Because the 1bitsy is quite simple, there is only one page to the schematic. If we look at the top of the schematic, centre-right, we can see that there's a status LED connected to GPIO port A, pin 8, which we'll call PA8 for short.

LED in 1bitsy schematic

There are three things we're going to need to do:

  1. Turn on the GPIOA clock
  2. Set GPIOA8 to an output
  3. Toggle GPIOA8

Turning on the clock

Before we can do anything with this GPIO pin, we need to set up its clock. Inside our chip, and inside the CPUs in our work laptops, there's a oscillator providing a clock signal that is used to synchronise different parts of the complicated integrated circuit that is our computer.

If we are going to use our GPIO pin, it needs to have its clock enabled, otherwise it is effectively off, and won't respond to any reads or writes. It defaults to being off because the peripheral consumes power when it's on.

To find out how to setup the GPIOA clock, we need to look at the STM32F415 reference manual, or ref man for short. We want to look at the memory map, to see what the start address is for the Reset and Clock Control (RCC) registers.

STM32 memory map

We're going to need a bit more information in order to set the clock, but this memory address is something we'll need in our code, so let's make a note of it (0x40023800).

Let's go to the RCC register map next – this is how we're going to find exactly which RCC register we need to write to in order to turn on the GPIOA clock.

RCC register map

The first column in this table shows the address offset from the base address we noted earlier. The numbers from 31 to 0 show the bits of the 32-bit registers.

If we look closely, we can see the field GPIOA_ENR for enabling GPIOA's clock – so, we want to set bit 0 in the AHB1ENR register. I realise that might seem very obscure; I think there are two things to note: firstly, there's actually a lot of additional documentation about this elsewhere in the ref man, showing the different memory buses and the clock tree. It would be too dense to show in this blog post.

Secondly, when you create a software API, a huge priority is making something that is useable and clear to developers (I should hope it is, anyway). When designing hardware, there are physical constraints, and the design has to be cheap and simple to mass manufacture. Consequently, clarity for us chumps cannot be a priority, and instead of a method call with helpfully named arguments, we have dense manuals like this...

Reading this sort of documentation does get easier the more you get to know your architecture, and the more experience you have reading similar manuals – as with anything :)

Actually writing code for real

Now: we're finally going to write some actual code. I am sorry I said "let's write some code" further up. We couldn't do it until we had this information from the ref man!

Let's copy that RCC base address into register 0. Our registers are all 32 bits wide, but we can only copy 16 bits at a time, otherwise we'd have no room for the rest of our instruction. So, we copy 0x00003800 into the register using the mov instruction, and then copy 0x4002 into the top half, hence the t in movt below.

Then, we want to set the 0th bit in the AHB1ENR register. First, let's copy 0x01 into r1. Then, let's store the contents of r1 in the memory address contained in r0, offset by 0x30 using the str instruction.

1     @ Store RCC base address in r0
2     movw r0, #0x3800
3     movt r0, #0x4002
5     @ Turn on GPIOA clock by setting bit 0 in AHB1ENR register
6     movw r1, #0x01
7     str  r1, [r0, #0x30]

With these runes, we can enable the clock!

All the mov instructions are about moving data into registers. The str instruction moves data from registers and into memory.

You can read more detail about these instructions in the User Guide for our CPU.

Setting GPIOA8 to an output

Next on our list is configuring GPIOA8 to be an output. As before, we can look up the base address of GPIOA registers in the ref man. It's 0x40020000. Then, we can have a read of the GPIO registers to find out which one we need to write to.

GPIO enable register

It looks like we want GPIOA_MODER, and you can see above that the reset value is 0xA8000000 for GPIOA. I understand this is because some of the GPIOA pins are used for the debug interface of the STM32, otherwise the reset value would be all zeroes. We want to change the two-bit field MODER8 to be 01, so we want to set the register value to 0xA8010000. There is no offset this time as the mode register is the first GPIO register.

1     @ Store start address of GPIOA registers
2     movw r0, #0x0000
3     movt r0, #0x4002
5     @ Use GPIOA_MODER to make GPIOA8 an output
6     movw r1, #0x0000
7     movt r1, #0xA801
8     str  r1, [r0]

Toggling the GPIO

GPIO output data register If we look at the GPIO documentation, it tells us that there is an output data register, but access to it isn't atomic. That's not a big problem for us here as we don't have any concurrency, but maybe we will later on! We can use the bit-set-reset register for atomic access instead. This also allows us to set individual bits in the output data register, instead of overwriting any values on other GPIO pins.

GPIO output data register

The direction our LED has been wired up means it's active low, so it will turn on when the GPIO output is cleared, and off when it is set.

So, to turn on our LED we want to set the BR8 field, and to turn it off, we want to set the BS8 field.

1     @ Set BR8 field in GPIOA_BSRR, to clear GPIOA8
2     movw r1, #0x0000
3     movt r1, #0x0100
4     str  r1, [r0, #0x18]
6     @ Set BS8 field in GPIOA_BSRR, to set GPIOA8
7     movw r1, #0x0100
8     str  r1, [r0, #0x18]


The last code snippet will just turn the LED off and on once. To create an infinite loop instead, we simply create a label (let's call it .loop) and then use the branch instruction to go back to that label!

 1 .loop:
 2     @ Set BR8 field in GPIOA_BSRR, to clear GPIOA8
 3     movw r1, #0x0000
 4     movt r1, #0x0100
 5     str  r1, [r0, #0x18]
 7     @ Set BS8 field in GPIOA_BSRR, to set GPIOA8
 8     movw r1, #0x0100
 9     str  r1, [r0, #0x18]
11     b .loop

Adding a delay

Now for something that is hopefully a lot more interesting than just shoving values into memory addresses. We want to do this in a loop, with a delay between turning the LED off an on!

There are a few ways you could do this delay. If precise timing was important, the timer peripherals of the STM32 can be used. We could also just add a lot of nop (no operation) over and over again -- that doesn't feel very sophisticated, and would give us a really large binary!

We're going to do this by putting a big number in a register and decrementing it until it hits zero. So, we're creating another loop, but this time with an exit condition.

 1 .loop:
 2     @ Set BR8 field in GPIOA_BSRR, to clear GPIOA8
 3     movw r1, #0x0000
 4     movt r1, #0x0100
 5     str  r1, [r0, #0x18]
 7     @ Delay
 8     movw r2, #0x3500
 9     movt r2, #0x000c
10 .L1:
11     subs r2, #0x0001
12     bne .L1
14     @ Set BS8 field in GPIOA_BSRR, to set GPIOA8
15     movw r1, #0x0100
16     str  r1, [r0, #0x18]
18     @ Delay
19     movw r2, #0x3500
20     movt r2, #0x000c
21 .L2:
22     subs r2, #0x0001
23     bne .L2
25     b .loop

The subs instruction here is subtracting, and the s suffix means that a flag will be set in the Program Status Register if the result of the operation is zero. The bne instruction means "branch if not equal (to zero)", so we'll jump back to the start of our delay loop if that zero flag isn't set.

Putting the pieces together

We now have everything we need to flash our LED – almost.

There's some boilerplate that needs added to our assembly file. We need to give our a name to the entry point, let's call it main.

There are two instruction encodings for ARM: ARM and Thumb. The encoding defines how the assembly is translated to machine code. It used to be that you needed different syntax for each of these, until ARM brought out their unified assembly language. Line 1 below is telling the assembler (the tool that turns the assembly into machine code) which syntax we are using.

Then, line 3 is telling the assembler that we are using the Thumb encoding for main, which is the only encoding our target (the STM32F4) supports. Then line 4 is exposing the symbol main to the linker.

1 .syntax unified
3 .thumb_func
4 .global main
5 main:

Lastly, we need to make sure our program is what runs when our microcontroller powers on. The reset vector is the location the CPU will go to find the first instruction it will execute after being reset.

What we’re doing below is putting the address of main into the reset vector so that when our board turns on, it will go to that address and start running our code to flash the LED.

1 .section .vector_table.reset_vector
2 .word main

We now have our final asm file:

 1 .syntax unified
 3 .thumb_func
 4 .global main
 5 main:
 6     @ Store RCC base address in r0
 7     movw r0, #0x3800
 8     movt r0, #0x4002
10     @ Turn on GPIOA clock by setting bit 0 in AHB1ENR register
11     movw r1, #0x01
12     str  r1, [r0, #0x30]
14     @ Store start address of GPIOA registers
15     movw r0, #0x0000
16     movt r0, #0x4002
18     @ Use GPIOA_MODER to make GPIOA8 an output
19     movw r1, #0x0000
20     movt r1, #0xA801
21     str  r1, [r0]
23 .loop:
24     @ Set BR8 field in GPIOA_BSRR, to clear GPIOA8
25     movw r1, #0x0000
26     movt r1, #0x0100
27     str  r1, [r0, #0x18]
29     @ Delay
30     movw r2, #0x3500
31     movt r2, #0x000c
32 .L1:
33     subs r2, #0x0001
34     bne .L1
36     @ Set BS8 field in GPIOA_BSRR, to set GPIOA8
37     movw r1, #0x0100
38     str  r1, [r0, #0x18]
40     @ Delay
41     movw r2, #0x3500
42     movt r2, #0x000c
43 .L2:
44     subs r2, #0x0001
45     bne .L2
47     b .loop
49 .section .vector_table.reset_vector
50 .word main

Building our code and flashing our target

We use an assembler to turn our assembly into an object file, e.g.

arm-none-eabi-as -mcpu=cortex-m4 toggle.s -c -o output/toggle.o

Then we use a linker to make an executable. We need a custom linker script to tell the linker where RAM and flash start on our target. Here's what I used:

 1 ENTRY(main)
 4     FLASH   : ORIGIN = 0x08000000, LENGTH = 128K
 5     RAM     : ORIGIN = 0x20000000, LENGTH = 128K
 6 }
 9     /* Vector table is first thing in flash */
10     .vector_table ORIGIN(FLASH) :
11     {
12         /* Initial stack pointer */
15         /* Rest of vector table */
16         KEEP(*(.vector_table));
17     } > FLASH
19     /* text section contains executable code */
20     .text ADDR(.vector_table) + SIZEOF(.vector_table) :
21     {
22         *(.text .text.*);
23     } > FLASH
24 }

Then we can call the linker: arm-none-eabi-ld -T link.ld output/toggle.o -o output/toggle

I'm using a Black Magic probe to flash my 1bitsy. I can talk to the probe over gdb:

gdb-multiarch -n --batch                    \
    -ex 'tar ext /dev/serial/by-id/example' \
    -ex 'mon tpwr en'                       \
    -ex 'mon swdp_scan'                     \
    -ex 'att 1'                             \
    -ex 'load'                              \
    -ex 'start'                             \
    -ex 'detach'                            \

Voila: A gif of the LED flashing