Linux executable without main() Function | Write a C Executable program without main() function

Very Short version :

1) Create the program without main() function.

rajkumar.r@17:56:58:~/workspace/raj/workouts/ex_without_main$ cat exe_wo_main.c 
const char my_interp[] __attribute__((section(".interp"))) = "/lib/";
int fn_wo_main() {
	printf("This is a function without main\n");

2) Compile it as a Shared object

gcc -shared -Wl,-e,fn_wo_main exe_wo_main.c -o

3) Run it!!

rajkumar.r@17:57:36:~/workspace/raj/workouts/ex_without_main$ ./ 
This is a function without main

Longer Version with Explanation :

Generally, all of the C programs starts at main() function. This is a standard defined fro C programming. When we look at the internals of this, we can understand how the system actually works in the background. C is a compiled language, which means, the source code is converted into the executable, before the execution starts. The general C program compilation process goes through the following steps, which is well documented in several other websites.

C Source code
Preprocessor – Intermediate files
Compiler – Object files
Linker – Linked Object file [ Executable file ]
Loader – Loads the executable and executes.

For our goal, we have to understand the linking process. We write our code starting from main(). But before the main(), there are several things happens in the background. Like, setting up of the environment, fetching input, configuring console etc. These are beautifully abstracted by GCC and the host system which hides those process from the user.

Here, we will unravel a small step of the long and complex process. The entry point of the execution. When the source code is preprocessed and compiled, the object codes are generated. After this, the standard and user built object codes are linked to generate the executable. The linking is done by the linker script which directs how to create the binary executable. This is where, the entry point of the executable is defined and configured as main().

Before main(), there happens few configurations. One among them is, configuration of dynamic loader, which loads the required symbols at run time. Usually in Linux, it is /lib/, which inturn links to the corresponding loader provided by the compiler collection. In my case, it seems to be as below.

rajkumar.r@17:59:50:~/workspace/raj/workouts/ex_without_main$ ls -l /lib/ 
lrwxrwxrwx 1 root root 25 Jan 28  2013 /lib/ -> i386-linux-gnu/

GCC Compiler is a very powerful and sophisticated compiler. GCC also offers a lot of control to the developers. One among several wonderful option is, -Wl,option. According to gcc man page,

	   Pass option as an option to the linker.  If option contains commas, it is split
	   into multiple options at the commas.  You can use this syntax to pass an argument
	   to the option.  For example, -Wl,-Map, passes -Map to the
	   linker.  When using the GNU linker, you can also get the same effect with 

From ld manual @t,

The linker command language includes a command specifically for defining the first executable instruction in an output file (its entry point). Its argument is a symbol name:


Like symbol assignments, the ENTRY command may be placed either as an independent command in the command file, or among the section definitions within the SECTIONS command--whatever makes the most sense for your layout.

ENTRY is only one of several ways of choosing the entry point. You may indicate it in any of the following ways (shown in descending order of priority: methods higher in the list override methods lower down).

	the `-e' entry command-line option;
	the ENTRY(symbol) command in a linker control script;
	the value of the symbol start, if present;
	the address of the first byte of the .text section, if present;
	The address 0. 

For example, you can use these rules to generate an entry point with an assignment statement: if no symbol start is defined within your input files, you can simply define it, assigning it an appropriate value---

start = 0x2020;

The example shows an absolute address, but you can use any expression. For example, if your input object files use some other symbol-name convention for the entry point, you can just assign the value of whatever symbol contains the start address to start:

start = other_symbol ;

We are going to combine all of these above features to run a program without main function. The steps are..

1) Create the program without main() function.

rajkumar.r@17:56:58:~/workspace/raj/workouts/ex_without_main$ cat exe_wo_main.c 
const char my_interp[] __attribute__((section(".interp"))) = "/lib/";
int fn_wo_main() {
	printf("This is a function without main\n");

Here, you can see we have a strange thing at line number 2. This is to compensate the missing main() and adding the dynamic loader section. 🙂

2) Compile it as a Shared object

gcc -shared -Wl,-e,fn_wo_main exe_wo_main.c -o

3) Run it!!

rajkumar.r@17:57:36:~/workspace/raj/workouts/ex_without_main$ ./ 
This is a function without main



Building and Booting Linux using Qemu

Previously, we have Built and Booted U-Boot through Qemu.
Now, let us build and boot Linux using Qemu.
Get the latest kernel source from
I took Stable 3.9.3 as on writing.

mkdir original
mkdir src
cd original
wget -c
cd ../src
tar -xf ../original/linux-3.9.3.tar.xz

Let us define the enviroinment variables that kernel build uses..

export ARCH=arm
export CROSS_COMPILE=arm-none-eabi-

Now, Let us configure this kernel build for Versatile Express.
This config is available at


For list of available configs, you can further explore in arch/ directory

make vexpress_defconfig;

Now, We need to make few changes, to make this kernel usable for our
needs in latter times.For this, We can remov module support (for
simplicity) and enabled EABI support as a binary format (allowing also old ABI).
This is necessary to run software compiled with the CodeSourcery toolchain.

kernel Features ---> Use the ARM EABI to compile the kernel and  
Kernel Features ---> Allow old ABI binaries to run with this kernel 

make menuconfig

We are all set to build the kernel. Now run

make -j4 all

Here, -j4 informs the build to use 4 build Jobs == Number of cores in your machine.
It will take some time to build.

Meanwhile, let us see what is the root file system and why do we need one, and
how to boot the kernel with a simple program.

Here is a definition of Root File System from Linux Information project

The root filesystem is the filesystem that is contained on the same 
partition on which the root directory is located, and it is the 
filesystem on which all the other filesystems are mounted (i.e.,
logically attached to the system) as the system is booted up 
(i.e., started up). 

The exact contents of the root filesystem will vary according to the 
computer, but they will include the files that are necessary for 
booting the system and for bringing it up to such a state that
the other filesystems can be mounted as well as tools for fixing a 
broken system and for recovering lost files from backups. The 
contents will include the root directory together with a minimal set 
of subdirectories and files including /boot, /dev, /etc, /bin, /sbin 
and sometimes /tmp (for temporary files).

Hope the kernel would have been built and ready by now.
The build should have completed with a message like,

  OBJCOPY arch/arm/boot/zImage
  Kernel: arch/arm/boot/zImage is ready

The kernel will be available at


Now, we can try to boot the kernel using qemu as below.

qemu-system-arm -M vexpress-a9 -kernel arch/arm/boot/zImage -append\

-M vexpress-a9 : Emulate V Express Board
-kernel arch/arm/boot/zImage : Use this file as kernel
-append "console=tty1" console acts as the tty1
- generall Linux uses tty interface to display console messages

Here, you can read what is tty

But, now, the kernel will end up in panic telling something like,

VFS: Cannot open root device "(null)" or unknown block(0,0): error -6
Please append a correct "root=" boot option;
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown 

Now, What we discussed about File system comes useful.
As the kernel message is telling, we are missing a root file system. To build
a complete set of root file system is a complex task [ Relatively 😛 ] So, we
are going to generate a simple file system.

Creating a Simple Filesystem :

create file test.c in the src directory with the below content.


void main() {
		printf("Hello World!\n");  

As the program looks, it will run continouesly as kernel expects the first
program to run forever. Compile this program, using cross compiler for Arm
running Linux
[ This is not same as bare metal toolchain. Bare metal toolchain is for ARM
which has no OS. i.e like arm-none-eabi-, which we have exported while
building kernel.]

arm-none-linux-gnueabi-gcc -static test.c -o test

This will Compile and creates an ELF, staticaly liked to all required code,
in a single binary. We need a Filesystem, but we have a binary file now.
So we need to generate Filesystem using some tool. Before that, we should know,
What is initramfs?

initramfs, as the name tells, its the Initial Ram File System. This is
introduced for Linux 2.6 kernel, before which initrd is being used.

From ubuntu Wiki

The key parts of initramfs are:

1) CPIO archive, so no filesystems at all are needed in kernel. 
   The archive is simply unpacked into a ram disk.
2) This unpacking happens before do_basic_setup is called. This means 
   that firmware files are available before in-kernel drivers load.
3) The userspace init is called instead of prepare_namespace. All 
   finding of the root device, and md setup happens in userspace.
4) An initramfs can be built into the kernel directly by adding it to 
   the  ELF archive under the section name .init.ramfs initramfs' can be
   stacked. Providing an initramfs to the kernel using the traditional
   initrd mechanisms causes it to be unpacked along side the initramfs'
   that are built into the kernel.
5) All magic naming of the root device goes away. Integrating udev into 
   the initramfs means that the exact same view of the /dev tree can be 
   used throughout the boot sequence. This should solve the majority of 
   the SATA failures that are seen where an install can succeed, but the
   initrd cannot boot.

This initramfs uses a format called newc. Now, to get the cpio archive,
initramfs from the binary, run the below command.

echo test | cpio -o --format=newc > rootfs

Now, we have the zImage kernel and rootfs – Initramfs. Let us load the kernel

qemu-system-arm -M vexpress-a9 -kernel linux-3.9.3/arch/arm/boot/zImage\
-initrd rootfs -append "root=/dev/ram rdinit=/test"


-initrd rootfs : Qemu option which tells, rootfs is the Filesystem 
binary image.

root=/dev/ram and 
rdinit=/test are kernel options passed to the kernel we load.

rdinit=/test tells the kernel to run "test" executable we built as init.

Now, we can see the “hello world” being printed.

Voila!! Done!!

Booting Uboot in QEMU

U-Boot is universal bootloader, which is used very widely. Its a starting point to learn the low level hardware interactions for me. Qemu supports Versatile PB and we can try to emulate uboor on versatile PB using QEMU.

Creating workbench:

$ mkdir uboot
$ cd uboot;
$ mkdir original
$ mkdir src
$ cd original

Download U-Boot source code from U-Boot FTP.

$ wget -c
$ cd ../src/
$ tar -xf ../original/u-boot-latest.tar.bz2
$ cd u-boot-2010.03

Compile the u-Boot code.

$ export ARCH=arm;
$ export CROSS_COMPILE=arm-none-eabi-;

$ make versatilepb_config 
$ make all

Now, if build went right, you should have u-boot.bin in the folder, and to boot this, run,

$ qemu-system-arm -M versatilepb -m 128M -nographic -kernel u-boot.bin;

Now you will be in uBoot prompt!!

U-Boot 2010.03-dirty (May 17 2013 - 15:52:24)

DRAM:   0 kB
## Unknown FLASH on Bank 1 - Size = 0x00000000 = 0 MB
Flash:  0 kB
*** Warning - bad CRC, using default environment

In:    serial
Out:   serial
Err:   serial
Net:   SMC91111-0
VersatilePB #

Try command like ‘?’, ‘help’ etc..

Bare Metal Programming for ARM VersatilePB II : Hello world

Reference :

As we are working on a bare silicon, we dont have any windowing system to display the prints. But in the silicon, we can verify some signal flow using some LED highs and lows.

These LED’s gets signal through certain peripherals and we decided to use UART as the peripheral to experiment as its available in UBOOT. Communicating to a peripheral is through memory mapped I/O. i.e, every peripheral is given a address and that is used to communicate as its a location.

Few more details for Memory mapped IO for this experiment..

1) QEMU supports versatilePB board. UART0 of verstailePB is mapped to terminal when using -nographic or -serial stdio option in qemu

2) The memory map of the VersatilePB board is implemented in QEMU in this board-specific C source and we got address of UART0 as 0x101f1000

Okay.. Lets start to write the code..

Contents of test.c
volatile unsigned int * const UART0DR = (unsigned int *)0x101f1000;

void print_uart0(const char *s) {
 while(*s != '') { /* Loop until end of string */
 *UART0DR = (unsigned int)(*s); /* Transmit char */
 s++; /* Next char */

void c_entry() {
 print_uart0("Hello world!\n");

As the UART is a dynamic device, we declare it as volatile and we assign the mapped address as we found earlier.

Hope the code is self explanatory.

Contents of test.ld

 . = 0x10000;
 .startup . : { startup.o(.text) }
 .text : { *(.text) }
 .data : { *(.data) }
 .bss : { *(.bss COMMON) }
 . = ALIGN(8);
 . = . + 0x1000; /* 4kB of stack memory */
 stack_top = .;

We start at adress 0x100000 as qemu loads the binary from this address.

Contents of starup.s
.global _Reset
 LDR sp, =stack_top
 BL c_entry
 B .

Next, we can compile this code..

$ arm-none-eabi-as -mcpu=arm926ej-s -g startup.s -o startup.o
$ arm-none-eabi-gcc -c -mcpu=arm926ej-s -g test.c -o test.o
$ arm-none-eabi-ld -T test.ld test.o startup.o -o test.elf
$ arm-none-eabi-objcopy -O binary test.elf test.bin

Here, you can notice an addition to last build, that is, creation of binary image..

Its because, U-Boot loads raw binary files..

And now, we can run this using the command,

$ qemu-system-arm -M versatilepb -m 128M -nographic -kernel test.bin

-M versatilepb : specifies qemu to emulate Versatile PB.
-m 128M tells qemu to use 128MB Ram for this system
-nographic : Dont use graphic and provide terminal output
-kernel test.bin : Load this binary as the system image.

You can see Hello World Printed in the screen., Now, to close this, Hit,

'Ctl + a' and then 'x'

Bare Metal Programming for ARM VersatilePB I : Getting Started

Goal :

  1. To understand Bare Metal Programming.
  2. To create, compile, and run a bare metal program, for versatilePB board using Qemu.

Requirements :

  1. qemu,
  2. Ubuntu,
  3. Cross compilation toolchain

What is Bare metal Programming??

From :

bare metal: n.

1. [common] New computer hardware, unadorned with such snares and 
    delusions as an operating system, an HLL, or even assembler. 
    Commonly used in the phrase programming on the bare metal, which
    refers to the arduous work of bit bashing needed to create 
    these basic tools for a new machine. Real bare-metal programming
    involves things like building boot proms and BIOS chips, 
    implementing basic monitors used to test device drivers, 
    and writing the assemblers that will be used to write the 
    compiler back ends that will give the new machine a real 
    development environment.

 2. “Programming on the bare metal” is also used to describe a style
 of hand-hacking that relies on bit-level peculiarities of a particular
 hardware design, esp. tricks for speed and space optimization that 
 rely on crocks such as overlapping instructions (or, as in the famous
 case described in The Story of Mel' (in Appendix A), interleaving of
 opcodes on a magnetic drum to minimize fetch delays due to the device's 
 rotational latency). This sort of thing has become rare as the relative
 costs of programming time and machine resources have changed, but is 
 still found in heavily constrained environments such as industrial
 embedded systems. See Real Programmer.

So, BMP, is a kind of programming, which is direclty linked to low level hardware interaction with minimalistic support of external tools customized to launch the application on the silicon.

Bare Metal == Bare silicon

From :

Bare metal programs run without an operating system beneath;

coding on bare metal is useful to

deeply understand how a hardware architecture works and

what happens in the lowest levels of an operating system.

Why C For Baremetal Programming??

From :

 Bare Metal Programming

Like assembly, in C you are forced to work directly with memory, 
pointers, and the underlying arrangement of the bits and bytes 
that make up your data structures and their layout (packed 
structures, byte ordering, data types and conversions, etc.) 
No garbage collection, no forgiveness for buffer overruns, no 
regular expressions. C is for bare metal programming — which 
typically require safe coding practices, tight program design, 
and a good understanding of the toolchain and the target environment.

But when you want the underlying elements of the data structures 
to be visible and your algorithms to be working with these bare
metal details, C is a great choice for turning out programs that
interact with hardware, networks, sensors, and peripherals, and 
that are small (compiled size), fast (execution time), and have 
a small footprint (few dependencies, low loading overhead).

With the aid of bare bones compilation modes like
gcc -S %1.c -fno-exceptions -s -Os, and of working syntax 
converters like A2I, whenever you wish to see the kind the
assembly code that C is generating for you, you can.

Embrace C for systems level development, and you’ll retain 
the power of assembly but from a more efficient point of view.

So, Let me know what does a Executable file contain? [ Example a.out]

Okay. Enough of this boredom. Let me get my dirty hand on code….

Simplest Bare metal code..

Contents of test.c

int c_entry() {
return 0;

Contents of starup.s

.section INTERRUPT_VECTOR, "x"
.global _Reset
B Reset_Handler /* Reset */
B . /* Undefined */
B . /* SWI */
B . /* Prefetch Abort */
B . /* Data Abort */
B . /* reserved */
B . /* IRQ */
B . /* FIQ */
LDR sp, =stack_top
BL c_entry
B .

Contents of test.ld

. = 0x0;
.text : {
.data : { *(.data) }
.bss : { *(.bss COMMON) }
. = ALIGN(8);
. = . + 0x1000; /* 4kB of stack memory */
stack_top = .;

Explanation of these contents :

1 : functiona named c_entry taking no arguments and returning int
2 : returns 0
3:  function ends

1: generates a section named INTERRUPT_VECTOR containing executable 
   (“x”) code. Later we can see we use this in linker script.
2: exports the name _Reset to the linker in order to set the program 
   entry point.
3: to 11 is the interrupt vector table that contains a series of branches. 
   The notation “B .” means that the code branches on itself and  stays 
   there forever like an endless for(;;);
14:initializes the stack pointer, that is necessary when calling C 
   functions. The top of the stack (stack_top) will be defined during
15:calls the c_entry function, and saves the return address in the link 
   register (lr).

1 : Enter / Start execution at this symbol inside binary
2 - 14 : Defines section layout
  3 : Begining of section definition
  4 : Start the address at 0x0
  5 : Place the .text segment next
     6 : Place startup.o in text segment.
	     It puts INTERUPT_VECTOR at 0x0
	 7,8 : End of text segment
  9 : Start .data segment here.
 10 : Start .bss Segment following data segment
 11 : Memory allignment is 8 bits
 12 : Allocate 4Kb of stack memory starting from here
 13 : Point the top of stack at this location
14 : Section layout definition ends here

 .txt  : Code segment
 .data : Initialized data
 .bss  : Uninitialized data

Compile them ..

raj@19:31:46:$ ls
startup.s  test.c  test.ld
raj@19:31:47:$ arm-none-eabi-as -mcpu=arm926ej-s -g startup.s -o startup.o
raj@19:32:43:$ ls
startup.o  startup.s  test.c  test.ld
raj@19:32:44:$ arm-none-eabi-gcc -c -mcpu=arm926ej-s -g test.c -o test.o
raj@19:32:53:$ ls
startup.o  startup.s  test.c  test.ld  test.o

Main Function??

Ohh.. okay.. but the simplest program I know is Hello world which has a Main function.. Where is the main here?? And what is this startup.s??

Yeah.. Yeah.. You are right.. Before pre judging something, please have a look at here to understand what is a Main function??

So now, You know Main function is not the starting point and before that C Run time library comes into picture..

But we dont have any support for crt0 in our ‘Bare silicon’. So we dont have main and we replace them with startup.s

Lets Link them..

raj@19:32:54:$ arm-none-eabi-ld -T test.ld test.o startup.o -o test.elf
raj@19:32:54:$  ls
startup.o  startup.s  test.c  test.elf  test.ld  test.o

Now, we got an ELF file, which is the executable

Okay. How do I run this now??

As you can see in the code, we do nothing but we just return 0. So we can not verify the output if we run this.. Sob..Sob..

So all of what I did so far is utter waste??

No!! You can sill verify your work and this is the foundation you have laid..

Let us examine your binaries and try to find out what it has..

We can use nm command to see symbols in a object files..

From man nm :
nm - list symbols from object files

Lets start with test.o which is our compiled c code

raj@20:05:18:$ ls
startup.o  startup.s  test.c  test.elf  test.ld  test.o
raj@20:05:55:$ nm test.o 
00000000 T c_entry

Here you can see the c_entry symbol listed against ‘00000000’ and ‘T’

‘T’ means its in Text segment.

Next, lets see our startup.o

 raj@20:15:54:$nm startup.o
00000020 t Reset_Handler
00000000 T _Reset
         U c_entry
         U stack_top

Here, we can witness our Reset_Handler as ‘t’ at location 20.

This 20 before Reset_Handler is occupied by the _Reset vector,

which we put at first.

Here you can see we have two cases of ‘T’ and ‘t’.

Might be ‘T’ indicates its global.

Okay. We linked it by our linker script test.ld..

Lets see the result of that..

raj@20:15:54:$nm test.elf 
00000020 t Reset_Handler
00000000 T _Reset
00000030 T c_entry
00001050 A stack_top

Good.. We have our data alligned as we have defined.

_Reset at start followed by Reset_Handler, followed by c_entry.

All these are text segment..

Then we can see our stack code..

Yeah.. fine.. But where is my data and .bss segments went??

In above, we cant see this because we have not declared any global data. So obviously we cant see them.. In case if you want to see them, you can declare them and recompile and check.

Buddha said, Dont belive untill you have chcked it.. So, I decided to check it.. I declared three variables as below.

 Contents of test.c 
int global_init_var = 10;
int global_unint_var;
int c_entry() {
    int local_uinit;
    return 0;

And here is the corresponding output

00000020 t Reset_Handler
00000000 T _Reset
00000030 T c_entry
0000005c B global_unint_var
00000058 D global_init_var
00001060 A stack_top

0000005c B global_unint_var – uninitialized Global variable which is put

into bss segment..

00000058 D global_init_var – initialized Global variable which is put

into data segment..

Want to explore some more??

Balducci has explained something about using gdb to debug symbols.. Please have the look at them too..

Cross Compiling – Reblog

Original Blog :

Introduction to Cross Compilation, Part 1

This post is the first in a series on cross compilation. In this series I’ll introduce the concept of cross compilation, and how to used it. Although there are many different uses for cross compilation, I’ll focus in this series in its use for embedded Linux systems development.

What is Cross Compilation?

When you develop a desktop or server application, almost always the development platform (the machine that runs your compiler) and the target platform (the machine that runs your application) are the same. By “platform” I mean the combination of CPU architecture, and Operating System. The process of building executable binaries on one machine, and run them on another machine when the CPU architecture or the Operating System are different is called “cross compilation”. A special compiler is needed for doing cross compilation that is called “cross compiler“, and sometimes just “toolchain”.

For example, desktop PC application developers for Windows or Linux can build and run their binaries on the very same machine. Even developers of server applications generally have the same basic architecture and Operating System on both their development machine and server machine. The compiler used in these cases is called “native compiler”.

On the other hand, developers of an embedded Linux application that runs on a non PC architecture (like ARM, PowerPC, MIPS, etc.) tend to use a cross compiler to generate executable binaries from source code. The cross compiler must be specifically tailored for doing cross compilation from the development machine’s architecture (sometimes called “host”), to the embedded machine’s architecture (called “target”).

Note: cross compilation is only needed when generating binary executables from source code written in a compiled language, like C or C++. Programs written in interpreted language, like Perl, Python, PHP, or JavaScript, do not need a cross compiler. In most cases interpreted programs should be able run unchanged on any target. You do need, however, a suitable interpreter running on the target machine.

What is Cross Compilation Good for?

I have covered above one reason for doing cross compilation, that is, the target machine has a different CPU architecture that the development host. In this case cross compilation is necessary because the binaries that the native compiler generates won’t run on the target embedded machine.

Sometimes cross compilation is not strictly necessary, but native compilation in not practical, or inconvenient. Consider, for example, a slow ARM9 based target machine running Linux. Having the compiler run on this target will make the build process painfully slow. In many cases target machine is just under-powered, in terms of storage and RAM, for the task of running a modern compiler.

Practically speaking, almost all embedded Linux development is being done with cross compilers. Strong PC workstation machines are used as development hosts to run the development environment (text editor, IDE), and the cross compiler.
Obtaining a Cross Compiler

The easiest way to obtain a cross compiler is to download a ready made pre-built one. Besides being easy to obtain a pre-built binary toolchain is the most useful for the general case of building a kernel and a userspace filesystem. Some special cases require a specially tailored toolchain built from source. I’ll show how to build a toolchain from source in the next post.

A short terminology note: in the following text I use the terms “cross compiler” and “toolchain” interchangeably. The have the same meaning in this context. The term “toolchain” seems to be more popular, however.


The most well known source of pre-built cross compilers is the embedded software division of Mentor Graphics, formerly known as CodeSourcery, an independent company that Mentor has acquired in 2010. They release the “Sourcery CodeBench Lite Edition” free of charge. Sourcery CodeBench is a collection of cross compilers for several CPU architectures, including ARM, PowerPC, MIPS, and Intel x86 among the others. For each architecture there are a number of target options. The one you need for embedded Linux work is the “GNU/Linux release”. Always select the latest version, unless you have a very good reason to avoid it. Then, there are a few packaging formats to choose from. I prefer the “IA32 GNU/Linux TAR” format. Installing it is just a matter of extracting the tar file in the /opt directory. For example, to install the latest MIPS toolchain do as root

tar xjf mips-2011.09-75-mips-linux-gnu-i686-pc-linux-gnu.tar.bz2 -C /opt
One big advantage of Sourcery’s toolchains is that those making them, former CodeSourcery employees, are deeply involved in upstream development of the GCC compiler.


The Linaro organization also releases pre-built cross compilers for their target platform, newer ARM processor based on Cortex-A. Download the latest version from here. You need the “Linux binary” one.

Using the Cross Compiler

This is just a quick peek at cross compiling for the impatient new embedded Linux coder. I’ll come back to this issue later in some greater depth.

First, put your newly installed toolchain in your path. For example, the Sourcery MIPS toolchain mentioned above needs the following command:

export PATH=$PATH:/opt/mips-2011.09/bin

Create a simple “Hello World” program, and save it in hello.c:

int main (void) {
    printf ("Hello World!\n");
    return 0;

Compile your program using the MIPS toolchain as follows:

mips-linux-gnu-gcc -Wall -o hello hello.c

Copy the resulting hello binary file to you target machine and run it there. If all goes well you should see the expected output.

There are many details to get wrong here, ranging from ABI issues, to C library and kernel version compatibility. I’ll cover some of these issues in future posts.

GCC Function Instrumentation

Original Post :

One of gcc’s more obscure features is -finstrument-functions. It was implemented by Cygnus Solutions, presumably as part of a contract for sombody-or-other to deliver something-or-other. When enabled, the compiler will emit calls to __cyg_profile_func_enter() and __cyg_profile_func_exit() at the top and bottom of every function.

Let’s examine a simple example which prints the function addresses at entry and exit.


void __cyg_profile_func_enter(void *this_fn, void *call_site)
void __cyg_profile_func_enter(void *this_fn, void *call_site) {
  printf("ENTER: %p, from %p\n", this_fn, call_site);
} /* __cyg_profile_func_enter */

void __cyg_profile_func_exit(void *this_fn, void *call_site)
void __cyg_profile_func_exit(void *this_fn, void *call_site) {
  printf("EXIT:  %p, from %p\n", this_fn, call_site);
} /* __cyg_profile_func_enter */

int foo() {
  return 2;

int bar() {
  return 1;

int main(int argc, char** argv) {
  printf("foo=%d bar=%d\n", foo(), bar());

The __cyg_profile_func_enter and exit functions are passed two parameters: the address of the function being entered/exited, and the address from which it was called. Note the use of the no_instrument_function attribute. If not present, then __cyg_profile_func_enter would be instrumented like any other function. Every call would result in calling the instrumentation again, which results in another call, etc etc until it blows the stack. Previously I’ve used -finstrument-functions to construct a profiler for a CPU whose interrupt structure was not suitable for a sample-based profiler. All of the routines implementing the profiler were labelled no_instrument_function.

Next we’ll examine the output, with just enough of the disassembled binary to make sense of it.

$ cc t.c -finstrument-functions
$ ./a.out
ENTER: 0x4005d0 @ 0x2b59e0d471c4 (calling main)
ENTER: 0x40059d @ 0x40060c       (calling foo)
EXIT:  0x40059d @ 0x40060c       (returning from foo)
ENTER: 0x40056a @ 0x400618       (calling bar)
EXIT:  0x40056a @ 0x400618       (returning from bar)
foo=2 bar=1
EXIT:  0x4005d0 @ 0x2b59e0d471c4 (returning from main)

000000000040056a :
  40056a: push   %rbp

000000000040059d :
  40059d: push   %rbp

00000000004005d0 :
  4005d0: push   %rbp
  400607: callq  40059d 
  40060c: mov    %eax,%ebx
  400613: callq  40056a 
  400618: mov    %eax,%esi

There are a few interesting things to note in the output.

Though main calls printf, we don’t see a call to printf in the output. Function instrumentation is implemented during compilation, and we didn’t compile printf we linked to an existing library. We’ll only see the instrumentation for functions compiled with -finstrument-functions.
The call_site is the instruction after the one which vectors over to run the function.
The call_site which called main() looks strange. It is not in the TEXT segment, it is way up at some weird address. This is address space layout randomization in action, every run of this binary has a different address calling main. I don’t know exactly what that routine is, but presumably it is part of the trampoline when the kernel begins executing a new process.

This instrumentation facility is not often used. The aforementioned call graph profiler is the only time I’ve used it. Nonetheless I hope you find it interesting.