Logo
The Glaux Operating System Project

Compiler Design

This page describes the design of the compiler for the Glaux operating system.

Project File

The project file acts as the input file for the compiler.

The project file describes the names and relationships between the source files, the first-level and second-level intermediate representation files, the library files and the executable files. It also keeps track over which source files have been modified since the last compilation, which source files, when compiled, belong to each compilation target (processor architecture and operating system) and which language is each source file written in.

Compilation Stages

For executable files, there are three compilation stages:

For library files, there are also three compilation stages:

A source file contains the code written by the programmer.

A first-level intermediate representation file contains the pre-optimised abstract syntax tree that is derived from the code provided in the source file.

A second-level intermediate representation file contains the linked and further optimised abstract syntax tree that is derived from the first-level intermediate representation files.

An executable file contains the output machine code that is derived from the second-level intermediate representation.

A library file contains the abstract syntax tree that is derived from the second-level intermediate representation files.

Compilation is only done for the specified compilation target.

Source to First-Level Intermediate Representation

The "source to first-level intermediate representation" stage of compilation is responsible for converting source files to first-level intermediate representation files.

For each modified source file and/or missing files that are consequent of the source file, the procedure described below is applied.

First-Level Intermediate Representation to Second-Level Intermediate Representation

The "first-level intermediate representation to second-level intermediate representation" stage of compilation is responsible for linking the first-level intermediate representation files together and optimising them.

For each modified first-level intermediate representation file and/or missing files that are consequent of the first-level intermediate representation file, the procedure described below is applied.

Second-Level Intermediate Representation to Executable

The "second-level intermediate representation to executable" stage of compilation is responsible for generating and installing the final output file from the second-level intermediate representation file. It usually runs on the end-user's machine.

For each modified second-level intermediate representation file and/or missing executable file, the procedure described below is applied.

Second-Level Intermediate Representation to Library

The "second-level intermediate representation to library" stage of compilation is responsible for installing the second-level intermediate representation. It usually runs on the end-user's machine.

The only thing done by the compiler in this case is to do minor changes to the second-level intermediate representation file and to copy it into /lib.

Error Detection

Error detection is the main aspect considered during the designing of the compiler (and the Glaux High Level programming language). Good error detection prevents many bugs that would go unnoticed otherwise, while contributing to a better user experience (less unexpected behaviour at runtime).

When incorrect code is detected, the compiler emits a warning. This makes many bugs visible to the developer at once and therefore improves the efficiency of finding and fixing them.

Optimisations

Optimisations are one of the most weighted aspects considered during the designing of the compiler. This means that the design allows optimisation opportunities, even though they are not necessarily implemented.

Thanks to whole-program analysis, the compiler can determine:

In the same time, thanks to intermediate representation being the most common distributed form of software for the Glaux operating system, the resulting code can be inherently optimised for a particular model of the target architecture. This is achieved by:

Assembly Integration

Calling Conventions

There are no standard calling conventions. Instead, the assembly programmer and the compiler are able to specify calling conventions themselves in order to generate the most efficient possible machine code.

Using specific registers for argument passing creates trouble in case these registers are already used for something else, which unfortunately cannot be fully prevented. However, it can be minimised by allowing the compiler to automatically decide on the optimal calling conventions for high-level functions and by allowing the assembly programmer to decide on the optimal calling conventions for assembly functions.

Calling High-Level Functions from Assembly Code

The assembler supports a special construct in place of the assembly instructions used to call a function. This constuct describes where are the arguments to the callee function passed and where are the return values from the callee function expected. Then the compiler adapts the calling conventions of the callee high-level functions in order to match the code of the caller assembly functions.

There is however a problem when different assembly stubs call the same high-level function in different ways. In this case, the compiler has to generate multiple equivalent functions where the only difference is the calling conventions. It is up to the assembly programmer to ensure this does not happen, although a compiler warning might be considered to be emitted in this case too.

Calling Assembly Functions from High-Level Code

The assembler supports a special construct similar to a function prototype. This construct describes where does the assembly function expect its arguments and where does it put the return values. Then the compiler adatps the calling conventions of the caller high-level functions in order to match the code of the callee assembly functions.

Sections

Section Definition

For each executable and libary file, the project file keeps track of sections that are used. A section definition contains the name of the section, the target architecture of the section (optionally), the start address of the section (optionally), and the end address of the section (optionally).

By specifying the target architecture, the programmer can override the architecture that applies to the machine code of the executable file. Such an option is needed for code that switches processor modes (e.g. a x86 BIOS bootloader that has to switch processor operation modes).

By specifying the start address, the programmer can override the behaviour of following the end of the previous section. Such an option is needed for code that has to be loaded to a specific address (e.g. a x86 BIOS bootsector that has to be loaded to address 0x00007C00). Note that the first section must have a specified start address.

By specifying the end address, the programmer can add padding after the end of code and/or data. Such an option is needed for code that has to have a specific length (e.g. a x86 BIOS bootsector that has to be exactly 512 bytes with the last bytes being 0x55 and 0xAA).

Reordering Within Sections

Functions within a section may be reordered (in order to generate machine code with smaller immediates for some architectures) and/or inlined (in order to elimitate call and return instructions).

Data structures within a section may be reordered (in order to generate machine code with smaller immediates for some architectures).

Differences Between Initial and Final Compiler

The initial compiler is being written in C and will (most likely) use LLVM for code generation (which allows the Glaux developers to start writing the other components faster). In other words, it will only provide an interface that is externally compatible with the design of the final compiler.

The final compiler will be written in Glaux-HLL and will not make use of other projects. The above-proposed design will be implemented.