Linux Kernel Explained

Thursday, January 12, 2012

My first kernel which booted- Part 1

For a long time I was trying to find a basic code which could boot in my virtual machine. Finally I could get my hands on interesting resources which helped me in writing my first kernel. (Mostly copy paste job).

1. 'How to write a simple OS' prvides with a very basic implementation of the booting script (in X86 assembly). Using the NASM assembler and Qemu emulator, we can build a kernel which writes 'Hello World!!!' on the screen. A small step for a programmer, a big one for me.

Next, the same page also provides with a mechanism of write into a floppy and also a CD. You can boot from this CD and print 'Hello World' on the screen. Except for the change that I had to add the following options '-no-emul-boot -boot-load-size 4' for converting it to CD.

However the catch here is that since the compilation is in binary, we cannot call a function from another file. And just changing the output of NASM to aout/elf does not work because the boot loader cannot understand the format (because it expects certain things at certain location in the file which does not happen with elf/aout format).

2. In order to use multi file code, the 'Bare-Bones' kernel is a resource which I succedded in replicating on my VM. This page gives excellent summary of what goes wrong and example code. I could boot this kernel by just copy pasting the code.

Again, the boot loader is written in assembly which is compiled by NASM. However the main difference is that, this assembly code then calls a function which can be written in C in a different file. Linker file is also provided which tells the compiler to prepare the binary with different sections at particular locations in the file.

The required instructions to prepare a GRUB file is also provided which can be written into a floppy and booted using Qemu.

To make a bootable CD, some further work is required as given here.

As an output, I could make a bootable CD which prints 'A' on the screen. :)

Further work:
To write the functions for interrupts and memory management as explained here.

Tuesday, April 21, 2009

The Address Space

The traditional organization of the virtual address space (as seen from user space, on x86 systems) is as shown in the diagram to the right. The very bottom part of the address space is unused; it is there to catch NULL pointers and such. Starting at 0x8000000 is the program text - the read-only, executable code. The text is followed by the heap region, being the memory obtainable via the brk() system call. Typically functions like malloc() obtain their memory from this area; non-automatic program data is also stored there.

The heap differs from the first two regions in that it grows in response to program needs. A program like cat will not make a lot of demands on the heap (one hopes), while running a yum update can grow the heap in a truly disturbing way. The heap can expand up to 1GB (0x40000000), at which point it runs into the mmap area; this is where shared libraries and other regions created by the mmap() system call live. The mmap area, too, grows upward to accommodate new mappings.

Meanwhile, the kernel owns the last 1GB of address space, up at 0xc0000000. The kernel is inaccessible to user space, but it occupies that portion of the address space regardless. Immediately below the kernel is the stack region, where things like automatic variables live. The stack grows downward. On a really bad day, the stack and the mmap area can run into each other, at which point things start to fail.

(Thanks to http://lwn.net/Articles/91829/).

Monday, April 20, 2009

Process Memory Handling

When an executable is loaded, the user data is stored:
1) statically allocated data region which is never released.
2) User Mode Stack is stored at an end. This memory region grows down so that the VM_END never changes, but VM_START is shifted down when the over flow occurs.
3) Dynamically stored areas called heap. This is the place where all the memory used by the program that is dynamically allocated is stored. This is the area which is reserved using malloc glibc call.

There are 2 ways to obtain the memory allocated from the kernel. brk() and mmap() are the system calls corresponding to kernel functions. brk() maps to the kernel function sys_brk which just expands the dynamically allocated heap. the mmap system calls corresponds to do_mmap function call which allocates a specif area for dynamic use. This area is returned to the Malloc system call which then manages the area for specific data structures.

Malloc, Calloc, etc

A user program requesting for a dynamically allocated memory uses a variant of malloc() for requesting memory. The malloc() is library call implemented in C-Library (most likely G-Lib-C). Malloc() takes care of the memory requirements of the program efficiently.

The simplest implementation of malloc() would be to invoke a system call everytime a process requests for memory and a system call when a process releases the memory. However this would be very expense because:
(1) system calls are computationally expensive.
(2) Too many memory regions means too much over head.

Thus the malloc functions implements some more intelligence. Whenever the process makes the first malloc request, the malloc translates it to a bigger memory request from the kernel. After this, the system call is not required for further memory requests as long as the memory allocated by the kernel is enough to satisfy the requests. Malloc function call manages the memory internally for the process.

The Malloc functions makes an internal decision when to invoke a system call for requesting or freeing up the memory. It also makes an internal decision whether to request a memory using brk() or mmap() system calls.

Friday, April 17, 2009

Memory Management in the Process Descriptor

The process descriptor task_struct contains the instances of structure mm_struct named mm and active_mm. For now mm is important for us, as for normal processes mm and active_mm point to same address. However for kernel threads the mm is null (as kernel threads dont have user_mode memory allocated. Here the active_mm points to some mm of some other process. More about it later.

The mm_struct contains the information about all the virtual memory that has been allocated for the process. There is a pointer to list of all virtual memory areas (pointed to by mmap of type vm_area_struct). The memory is also managed in a red-black tree for easy searches. The tree also has a pointer in mm_struct.

The vm_area_struct contains the start_address and end_address of the virtual memory area. Each of this vm_area_struct is also associated with a node in red_black tree. The node pointer is present in vm_rb.

These are core elements of the node structures used to manage the virtual memory.

Wednesday, April 15, 2009

Creating a Process

In this post, we discuss the process through which a child process is created. This child process shares everything with the parent process including the open file descriptors, memory, the executable etc. If the process wants to load a new executable in the child process, then execve call has to be made. This will be discussed later.

A new process is created when a old process executes a fork(). The new process is an exact copy of the old process. (Who creates the first process? I am tempted to say god, but I would rather speak thruth. The first process is the init process which is literally created from scratch during booting).

There are many library functions which are used for forking (e.g, clone(), vfork() etc). These calls manage to call a system call (sys_fork) which in turn calls a function within the kernel, do_fork. do_fork is called along with a host of flags to indicate the characteristics required of the child process.

The do_fork starts off with getting a new PID. This is done using alloc_pidmap(). The alloc_map allocates a new PID after checking the pidmap_array (which is has a bit set for each PID). The search is started from last+1, and the PID is checked against the pidmap_array. The PID which is free is returned to the do_fork and last is also set to this value.

Under normal conditions, the do_fork copies the task_struct of the calling process to be used for the child process. The process is copied by the function copy_process(). The task_struct is created using sup_task_struct().

The dup_task_struct() gets memory for both task_struct and thread_info, copies the parents thread_info and task_struct into the newly created structures. Only the thread_info field in the new struct is referred to the new thread_info and the task_struct field in the new thread_info is referenced to new task_struct.

The copy process is more interesting:
- It first creates the task_struct for use by the new process. (This is done using dup_task_struct() ).
- Various fields like flags are copied from the old process structure.
- Scheduler related operations are done using sched_fork().
- copy_mm() copies the virtual memory for the new process. Copy_mm() just copies the oldmm (mm of curren tsk) into the new mm and returns.

The task is now ready for running.

The newly allocated task_struct is then scheduled in the run_queue but is not yet activated. The function returns to the caller which then decides to call the scheduler.

The wake_up_new_task() is called which adds the process descriptor to the runqueues so that the task can be scheduled.

Thursday, March 20, 2008

Process

Process is a construct in linux kernel. A process defines the granularity at which the kernel can allocate resources (e.g. memory, processor time etc).

Each process in linux has a unique Process Identifier (PID). The PID is allocated to the process at the time of the creation and is never changed. This allows the kernel and the user to control the process using the PID. For e.g., the command for terminating the process takes PID as the input.

Each process has 2 data structures which are allocated. They are: thread_info and task_struct. The thread_info contains a pointer to the task_struct data structure. Also, the thread_info is allocated at the bottom of the Kernel Stack which is pointed to by the register (esp). This arrangement of thread_info and task_struct makes it easy for the kernel to refer to the task_struct of the current process very easily. This is defined by the macro Current.

The thread_info is part of the union containing the kernel stack and the thread_info structure. Thus the kernel stack and the thread info can share the same memory pages.

The task_struct contains the data related to this process. This structure is called Process Descriptor. The information stored in this structure helps the kernel access correct memory, files, scheduling information etc.