THIS TEXT IS THE MAIN README FILE OF AN OPERATING SYSTEM KERNEL AND ITANIUM (IA64) ARCHITECTURE TUTORIAL SOFTWARE, MAINLY INTENDED TO BE USED WITH THE SOURCE. ALSO THIS TEXT CAN GIVE A GENERAL LEVEL IDEA ABOUT THE TUTORIAL SW FOR AN EXPERIENCED (KERNEL, ITANIUM) READER WITHOUT THE SOURCE.
Itanium is dead? Why then this software? Maybe dead, maybe not. So what?
Description: Itanium (Intel tm) tutorial "nano kernel", the name WBS, stands for WildBoar Scafa. Redistribution with or without modification is allowed without author's written permission. Everybody can use the software for self- learning purposes freely. All comments about the usage, experiences and bugs are appreciated.
THIS SOFTWARE IS PROVIDED BY THE AUTHOR `AS IS' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGE. THIS SOFTWARE IS PROVIDED BY THE AUTHOR `AS IS' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES, LOSS OF USE,DATA, OR PROFITS, OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
I did write this system as a tutorial very compact kernel for Itanium (Intel tm) architecture, my original idea was to do that with less than 1000 source lines all together (.c, .s, .h). (Now it is 50% more, because I did expand the original idea and implemented more system calls expecially fork and demand paging. In fact, I did succeed with the original goal of less than 1000 lines, but with an even more limited version).
This compact kernel helps to understand mainly HW/SW- interface but also the basic core functions in every kernel as: scheduling, context switching and inter process communication. The following short documentation is written based my own experiences to get familiar with some similar SW. I did write a short introductory general level documentation which I have always hoped for. See (8), it explains in a very compact manner how the parts of the SW work together. This kind of description helps to get the basic idea of the whole system, at least to my mind.
Linux Itanium has "a little bit more- 4000000" source lines- for sure difficult as tutorial Current implementations do not seem to me "special HW friendly", this could be seen in future as bad benchmarking results. Some example would be useful to show throughput of Itanium architecture as motivation (even not a full scale operating system)
2 To be able to understand the material you have to (most cases) have:
c- programming skills Basic understanding of the assembly programming (and some Itanium instruction set info as a book or a manual) Good understanding of the operating system kernel internals (see below the next title, are terms familiar?) Book or manual about Itanium architecture concepts as: "region register", "protection key", "register bank switch", "register stack engine"......must be familiar to every user of this sw.
Tested with "HP Itanium ski- simulator"- which is freely distributed SW Has priority based scheduling Is pre-emptive Use protection and virtual memory (TLB translations) Has epc- gate for all "never blocking" system calls Has shared library page for tasks (system call stubs)n helps to get the basic idea of the whole system, at least to my mind.
Synchronous message passing for IPC (send, receive) Has timeout for synchronous message passing Shared memory for IPC (shm_create, shm_open, .....) Binary semaphores for IPC (sem_create, sem_post, sem_wait, ...) Has total of 17 system calls Tries to imitate "memory resident" system (as an embedded system) Written to be at first fast, secondly compact Has, of course, lot of things missing needed for a real commercial kernel Has currently max number of tasks 64, however, it would not be any big thing to change this (no major architectural changes)- The newest version contains extremely simple demand paging add fork system call. All virtual memory access refer to empty physical memory first. Disc reading is replaced simulated interface to downloaded elf binary.
Of course, bugs are still hiding but the kernel is at least "tried". For example a set of tasks circulating a message using both the message passing and the shared memory every "round", also one system call causing "context switch" in the bottom of 100- level deep recursive subprogram (as RSE implementation test). Also classical producer/consumer/shared memory semaphore test has done.
Examples of the usage of these calls in application tasks in the task directories
./taskXY, X tells the test set number and Y the task number in the set X
The epc- gate page calls (Itanium fast system call interface):
get_pid_call get_time_tic_call sem_create_call shm_create_call sem_open_call sem_close_call shm_close_call sem_init_call get_waiting_sender_call get_waiting_receiver_call- fork call to initialize the3 system
The break calls with parameters:
send_call(long task_nr, long message) receive_call (long task_nr) shm_open_call(long shm_nr) //This is, of course, "non blocking call", other are "blocking" sem_post_call(long sem_nr) sem_wait_call(long sem_nr) delay_time wake_up- fork_Hannu
...........................
#
# The system comes with some test "task sets", 6 , testing different things
# Use "make TESTX=1",where X=1,2,3,4,5,6
#
# So, for example:
#
# *********************************************************
# make TEST5=1
# make TEST3=1 clean
# *********************************************************
#
# Above first make compiles as: kernel + test set 5
#
.......................................
#
# "The HP Itanium ski- simulator ready" elf- binary file is after the make in "./systemdir/build/wbs"
#
# You can see (make_defs) "odd register restrictions" (FIXED variable) for kernel, the kernel uses only
# regisrers r16-r31
# and regs r0,gp,b0,b1 (as well as ar.ks)
# The test sets are in numbered directories, for example test 5:
# ../task51 ../task52 ../task53 ../task54
#
# This system uses cross compiler binaries, as seen in make_defs:
# LD = /usr/local/bin/ia64-elf-ld
# AS = /usr/local/bin/ia64-elf-as
# CC = /usr/local/bin/ia64-elf-gcc
# STRIP = /usr/local/bin/ia64-elf-strip
#
# Currently there is no automatic tool to set up the environment by a configuring script
# You can edit make_defs for your own system
# Also you must edit linker scripts -for a new test set -if you need one (./systemdir/build/Xlds.lds)
#---------------------------------------------------------------------
Line numbers before adding texts and more comments
41 ./lib/break_call_stubs.c
77 ./startup/k_kernel.c
205 ./kernel/e_epc.c
279 ./kernel/k_run_time_kernel.c
700 total
assembly code:
72 ./lib/epc_call_stubs.s
14 ./int_handlers/epc.s
55 ./int_handlers/epc_entry.s
73 ./int_handlers/wbs_int.s
83 ./int_handlers/kernel_break.s
./int_handlers/tlb_fault_d
./int_handlers/tlb_fault_i
./int_handlers/k_tlb_fault_sub.c
90 ./startup/k_start.s
208 ./kernel_asse/k_context_switch.s
37 ./kernel_asse/k_wbs_timer.s
25 ./kernel_asse/k_flash_find.s
700 total
include files:
37 ./include/asse_inline.h
138 ./include/system.h
14 ./include/user.h
200 total
system boots and sets up HW as kernel regions and interrupt vector table .... (./startup/k_start.s) k start.s calls c-initialization routine to initialize kernel main data structures as TCB (Task Control Bloack), register saving area (tcb), TLB- translations for tasks, semaphores and shared memory structures (./startup/k_kernel.c) start the timer running (TC) timer interrupts happens (by TC) (./int_handlers/wbs_int.s) timer handler wakes up init_tasks (handler: ./kernel/k_run_time_kernel.c) timer c-handler calls scheduler ( flash()) scheduler finds highest priority task(./kernel/k_run_time_kernel.c) scheduler calls assembly context switch routine (./kernel_asse/k_context_switch.s) registers are saved and new task registers restored (also TLB translations taken care) rfi (Return from an Interruption) to a new task task makes receive_call, stub in library page (./lib/break_call_stubs.c) (./int_handlers/kernel_break.s) runs it jumps to receive handler (./kernel/k_run_time_kernel.c) handler finds out no sender waiting a receiver (task blocking) because of the synchronous message passing handler calls scheduler, which finds the next task to run (this blocked) scheduler calls (./kernel_asse/k_context_switch.s) HW- registers are saved and new task registers restored (also TLB translations taken care) rfi to next task (return from interrupt) next task clears out is the first task waiting a message from it (get_waiting_sender_call)- we go to library stub and branch to epc gate page
- service routine in the (./kernel/e_epc.c)
- return back to task
- task makes a break to send a message, (can (send also if the task is not waiting)
- .....................
---------------------------- the ball is rolling on --------------------------------------------------------------
The TLB translations at run time is this moment:
kernel (privilege 0) epc- gate page (executing in kernel privilege, "callable" by tasks) shared user privileged library page (system call stubs.....) shared memory, if a task has opened one for IPC task- RSE backing storage translation (must exist always)
Fault handling
Very simple solution is implemented: Fault handler just loads the number of fault in a register and marks faulting process "crashed" and continues.
New: TLB- fault
The whole scheduling is few clocks even for big number of tasks, no looping (see more 14.1)
Kernel do not use RSE it uses: (bank.0 registers) and r0, gp, b0 , b1, pr ar.kx only A big problem : kernel and tasks must be compiled AND linked in together because we are not loading task one by one from the disc In my previous ia32 kernel I used Unix/Linux cat tool to build up core image from separately compiled parts as:cat kernel task1 task2 task3 ....taskN > coreimage
and did read the image with my own boot code transferring the control to the kernel
c- code is intentionally written without parameters and local variables. (We do not use RSE in kernel)! From this it also follows that (few calls are really "behind #define" and actually assembly jumps. For example: #define _return_same_() __asm__("...br.sptk.many b1 "); See ./include/inline_asse.h.
Very, simple clean architecture Short service routines The kernel do not bother the RSE at all for itself (the context switch is an other thing, the USER RSE state is saved) Flash scheduling based on a bit masks, never any sorting or queuing needed (looping can be also eliminated totally if need be) This approach allows easily increasing the number of task without penalty in speed, it is not dependent on 64 bit register size as it might seem to. We can have more masks or a hierarchy of masks. Extremely minimal saving of the state before we really know that the context switch is necessary, (also potentially blocking calls can return often to the same context!) All "never blocking calls" are in the epc- gate page (faster than to use the break interrupt) ..............................
It is quite obvious that an about 600- 800 c- source lines kernel can not be compared for example with13 Mind expanding exercises:
4,000,000 c- lines Linux kernel. Only basic core functions exist in a simple form in this SW.
No current device HW drivers (the timer interrupt is implemented and is a kind of driver), because of simulator testing. Only few examples of fault handlers.. Nat bit handling is not tested (it is implemented), because tasks run any case correctly, no problem encountered so far, gcc do not seem to use "speculative loads". Floating point is lightly supported. have NOT math library. Floating pont regs are tested with inline instructions (spill/fill works)
A separate link now.
14 Appendix, some illustrative source examples:
The following source examples will give a general level idea of the system architecture and functions.
The ready to run task mask is loaded to a register, the flash find finds out the first mask bit.
If the number of processes would be 120, it would slow down the function by a very small amount, first we are checking is the first mask register different from zero or not and take correct one , that is all (priority based scheduling). On the other hand, also "round robin" could be implemented same way zeroing part of the mask.
(./kernel/k_run_time_kernel.c)
void sem_wait(void) { /* Pretty self explanatory
register long _sem_index __asm__("r30"); /* Sys call input parameter in the register r30 from r32
register struct TCB_type *ptr_current;
register struct sem_type *ptr_sem ;
if((_sem_index <0) || (_sem_index >= MAXSEM)) {
message = SEM_INDEX_ERROR_ACK; /* Wrong sem index, can't wait
_return_same_(); /* Assembly routine to return to the same context
}
ptr_current = TCB_ptr[current_task_nr]; /* Task control block pointer
ptr_sem = sem + _sem_index; /* Semaphore to be waited
if((ptr_sem >open_mask & (1 << current_task_nr)) == 0) {
message = SEM_NOT_ALLOWED_ACK; /* Sem not opened , can't wait
_return_same_();
}
if(ptr_sem >STATE == SEM_FREE ) {
ptr_sem >STATE = SEM_OCCUPIED;
message = SEM_GRANTED_ACK;
_return_same_(); /* We got the semaphore, no waiting
}
ptr_sem >wait_mask = ptr_sem >wait_mask | (1 << current_task_nr);
ptr_current >STATE = WAITING_SEM; /* We must wait sem_post
_ready_ = (unsigned long) _ready_ & (~((unsigned long)(1 << current_task_nr )));
/*We are not running or ready to run any more
ptr_current >timeout = MAX_TIMEOUT;
_schedule_(); /*We must wait, scheduler will seek an other task to run
}
epc;
.....
cmp.eq (p7),(p0) = 7 , r32; /* The call number in r32
cmp.eq (p8),(p0) = 8 , r32;
cmp.eq (p9),(p0) = 9 , r32;
cmp.eq (p10),(p0) = 10, r32;
cmp.eq (p11),(p0) = 11,r32;;
(p1)add r16 = @gprel(get_pid),gp;
(p2)add r16 = @gprel(get_time_tics),gp
(p3)add r16 = @gprel(get_time_tics),gp
(p4)add r16 = @gprel(get_waiting_sender),gp;
(p5)add r16 = @gprel(get_waiting_receiver),gp;
(p6)add r16 = @gprel(sem_create),gp;;
..........
(./kernel_asse/k_context_switch.s)
...........
.macro spill PTR1 PTR2 REGLIST
.irp REG, \REGLIST
st8.spill [\PTR1] = \REG,16
st8.spill [\PTR2] = \REG+1,16;;
.endr
.endm
spill PTR1 = r26 PTR2 = r27 REGLIST = "r1,r3,r5,r7,r9,r11,r13,r15" /* Also even numbers saved:r2,r4,r6,....(\REG+1)
.......
fill PTR1 = r26 PTR2 = r27 REGLIST = "r1,r3,r5,r7,r9,r11,r13,r15" /* Also even numbers restored: r2,r4,r6,..
.....
(./kernel_asse/k_context_switch.s)
Shared memory and task TLB translation is always present when needed, no TLB miss ever.
/* Insert new translation in TLB task or shm (context switch)
.macro set_TLB_tr PTR INDEX_REG INDEX ITIR IFA IDTR TMP TMP1 LAST_VIRT TRANSL
add \PTR = @gprel(\TRANSL),gp;; /* Pointer to pointer of itir ifa ... save area.endm
ld8 \PTR = [\PTR];;
ld8 \ITIR = [\PTR],8;;
ld8 \IFA = [\PTR],8;; /* IFA is a general register keeping cr.ifa contents
ld8 \IDTR = [\PTR];;
mov \INDEX_REG = \INDEX;;
add \TMP = @gprel(\LAST_VIRT),gp;;
ld8 \TMP1 = [\TMP];;
cmp.ne (p1),(p0) = \IFA,\TMP1;; /* If the same translation exist, we do not make it again (purge not needed)
(p1) st8 [\TMP] = \IFA;; /* For example with shared memory this is often the case
(p1) mov cr.itir = \ITIR;; /* pkr key register, set elsewhere, takes care of protection
(p1) mov cr.ifa = \IFA;; /* even translation exists with a task not allowed to access shm
(p1) add \TMP = @gprel(\LAST_VIRT),gp;;
(p1) itr.i itr[\INDEX_REG] = \IDTR;;
(p1) srlz.i;;
(p1) itr.d dtr[\INDEX_REG] = \IDTR;;
(p1) srlz.d;
We create a shared memory and build up the producer/consumer "modification test case".
We use semaphores and the shared memory but do not really "produce and consume",
counters can be checked for correct functionality by Ski debugger.
Tasks increment a common counter by 1 and own counter by 2, so all 3 counters should be the same
(depending a little bit the interruption moment almost the same).
........
........
void T1_task (void) {
}if(shm_create_call(SHM_KEY,SHM_OPEN_ALLOW_MASK_2) <0 ) error(0xc);;
if(shm_open_call(SHM_KEY) <0 ) error(0xd);
T1_ptr = SHM_PTR; /* Global id, gcc is not able to accept 64 bit pointer
T1_ptr_own = T1_ptr; /* No problem, shared memory has always the same VIRTUAL address
T1_ptr_own++; /* One shm at a time
if(sem_create_call(PRODUCER_KEY, SEM_OPEN_ALLOW_MASK_2) < 0 ) error(0x10);
if(sem_open_call(PRODUCER_KEY) < 0 ) error(0x12);
if(sem_init_call(PRODUCER_KEY,SEM_FREE) <0 ) error(0x11);
if(sem_create_call(CONSUMER_KEY, SEM_OPEN_ALLOW_MASK_2) < 0 ) error(0x13);
if(sem_open_call(CONSUMER_KEY) < 0 ) error(0x14);
if(sem_init_call(CONSUMER_KEY,SEM_OCCUPIED) <0 ) error(0x15);
while(1) {
if(sem_wait_call(CONSUMER_KEY) < 0) error(0x16); /* Post waited from producer
*T1_ptr = *T1_ptr + 1;
*T1_ptr_own = *T1_ptr_own + 2;
if(sem_post_call(PRODUCER_KEY) < 0) error(0x17); /* Post provided to producer to continue
}
(./int_handlers/wbs_int.s, and break int)
If we came to an interrupt handler from HW interrupt, we save cr.iip and return it later as it.
If we come from a break call we must adjust cr.iip, CR.IIP POINTS TO THE BREAK INSTRUCTION NOT
TO THE NEXT AFTER THAT, WHICH WOULD BE SENSIBLE.
..............
mov r30 = cr.iip;;
add r30 = 16,r30;; /* Incrementing is needed, otherwise we return to break not
mov cr.iip = r30;; /* instruction after it, timer interrupt is a different case
mov r16 = pr; /* Saves user predicates
mov r17 = cr.iim; /* Break number, sys call number
mov r22 = 0xb; /* flag 0xa interrupted by HW, 0xb by break (system call)
mov r18 = b0; /* save user b0
...........
14.7 A fault handling example
.............................................................
void tlb_fault_h(void) { /* c- language fault handler
register long r16 __asm__("r16");
register long r17 _asm__("r17");
fault_address = r16;
fault_reason = r17; /* Reason of fault saved
ready_ = (unsigned long) _ready_ & (~((unsigned long)(1 << current_task_nr ))); /*We are not ready any more
ptr_current = TCB_ptr[current_task_nr];
ptr_current->timeout = MAX_TIMEOUT; /* Means for ever
ptr_current->STATE = CRASHED; /*This task is not able to run any nore
_schedule_(); /* Find a new one
}
Linker script:
..................................
. = wbs_ivt + 0x5300;
tlb_fault.o
......................................
SEE MORE FROM THE CODE, GOOD LUCK!