Programming

C program compilation process into executable

cover

  • Hey there guys. It’s been a while since my last blog. Sorry about that πŸ˜₯. But starting today every Friday I’m going to post on my blog. So that’s going to be exciting πŸ˜ƒ.

  • Today we’re going to be learning how a C program gets compiled into an executable βš™οΈ.

Things we’re going to be discussing πŸ’¬

  • A brief history of C programming language.
  • What is a compiler?
    • What is compiling?
  • Stages of compiling
    • Preprocessing
    • Compiler
    • Assembler
    • Linker
  • What happens when the user clicks the executable

πŸ”΅ A brief history of C programming language

  • C was created by Dennis Ritchie in the 1970s. Pretty old you would say. But C is the building block of most programming languages today. Ex: Python, Java, Javascript, PHP, Perl. The list goes on πŸ˜„.

  • C is a general-purpose high-level programming language. Although C is considered a high-level, it can work closely with hardware (machine code) πŸ‘Ύ.

  • With C, you can create anything except websites. This includes
    • Operating Systems
    • GUIs
    • Embedded Systems
    • Databases Management Systems (Ex: MySQL)
    • Programming languages
    • Games development

πŸ”΅ What is a compiler

  • A high-level language (AKA source code) is the code that you write on your C program. It can be only understood by humans similar to English.

  • Machine code is the only language that computers understand. So we need a translater to convert high-level code into machine code. This is where the compiler comes into play.

  • A compiler is simply a program that converts your C program (source code) into machine code.

πŸ”΅ Stages of compiling

  • There are 4 stages that a compiler does when translating your source code into machine code.

  • Before diving in let’s take some example C code and see what happens as we progress through each stage

Figure 1.0

🟒 1. Preprocessing

  • Preprocessor / preprocessing is the first step of compiling. As the name suggests this stage will do some adjustments to your source code before it gets compiled.

  • The preprocessor takes your source code as input and looks for Preprocessor directives (things starting with a # code).

  • The preprocessor processes the directives, combines all the header files, and removes all the comments, spaces, and new lines from your source code.

  • After doing all of the above operations the preprocessor will output the results to a new file called the intermediate file. This version of the source code is optimized for the compiler to start compiling.

  • Intermediate file extension: .i

  • Let’s see how the intermediate file look after the preprocessing stage.

$ gcc -E program.c
...
...
... Code from stdio.h header file
...
...
# 4 "program.c"
int main()
{

    printf("Hello World :). Favorite number is %d", 10);
    return 0;
}

🟒 2. Compiler

  • The compiler takes in the intermediate file and converts it to assembly instructions. These assembly instructions are specific to the target processor architecture (x86 or x64).

  • Assembly extension: .s

  • Let’s see how the assembly instructions look after the intermediate file is compiled. Note: the following assembly code is in x86 architecture. Command: gcc -S program.c

	.file	"program.c"
	.text
	.def	_printf;	.scl	3;	.type	32;	.endef
_printf:
	pushl	%ebp
	movl	%esp, %ebp
	pushl	%ebx
	subl	$36, %esp
	leal	12(%ebp), %eax
	movl	%eax, -16(%ebp)
	movl	-16(%ebp), %ebx
	movl	$1, (%esp)
	movl	__imp____acrt_iob_func, %eax
	call	*%eax
	movl	%ebx, 8(%esp)
	movl	8(%ebp), %edx
	movl	%edx, 4(%esp)
	movl	%eax, (%esp)
	call	___mingw_vfprintf
	movl	%eax, -12(%ebp)
	movl	-12(%ebp), %eax
	movl	-4(%ebp), %ebx
	leave
	ret
	.def	___main;	.scl	2;	.type	32;	.endef
	.section .rdata,"dr"
	.align 4
LC0:
	.ascii "Hello World :). Favorite number is %d\0"
	.text
	.globl	_main
	.def	_main;	.scl	2;	.type	32;	.endef
_main:
	pushl	%ebp
	movl	%esp, %ebp
	andl	$-16, %esp
	subl	$16, %esp
	call	___main
	movl	$10, 4(%esp)
	movl	$LC0, (%esp)
	call	_printf
	movl	$0, %eax
	leave
	ret
	.ident	"GCC: (GNU) 11.2.0"
	.def	___mingw_vfprintf;	.scl	2;	.type	32;	.endef

🟒 3. Assembler

  • In this stage our assembly instrcutions are been converted to machine code (AKA object code/object file) by a program called assembler.

  • Object code extension: .o , .obj

  • Let’s see how the machine instructions look after the assembly instructions are been translated by the Assembler. Machine code instructions are not readable by a text editor. Command: gcc -c program.c

Figure 1.1: The text editor only displays printable and readable characters.

🟒 4. Linker

  • In C, a library is a collection of functions, which are declared in the header files. Example: printf() is a standard library function that is declared in the stdio.h header file.

  • When compiling, the standard libraries that you use on your source code also get compiled into their own object files.

  • The linker program will link together a bunch of object files into a single executable. In other words, it links or creates a reference to other additional files. Here the linker program will link the standard libraries (used on your code) + your source code together. Then create the final executable.

πŸ”΅ What happens when the user clicks the executable?

  • Now assume that you have generated the executable successfully. What happens when the user clicks on that executable and how will it execute?.

  • When the user clicks on the executable. A program called loader loads your executable to memory (RAM).

  • The processor will now execute the program instructions that have been loaded to the memory and the execution of the program begins.

Alright guys that’s it for today’s post. I hope you guys have learned something valuable from reading this post. If you like this post please share it and if you have any questions & suggestions please feel free to post them down in the comments or contact me. If I made any mistake somewhere in this post please DM me. I’d love to hear and learn from you.

Have a great day guys. I’ll see you in the next post πŸ‘‹.