CLWP 1.3 Documentation

Kees Moerman, Januari 2003

CLWP: Co-operative Light-Weight Processes (actually, threads)

Disclaimer: THIS SOFTWARE IS PROVIDED "AS IS" WITHOUT WARRANTY, IMPLIED OR OTHERWISE. I make no promises that this software will even do what it is intended to do or claims to do. USE AT YOUR OWN RISK.

This software is provided to the public domain. The only restriction is that anything that is created using this software has to include a list of names (in the included "thanks" file) crediting the contributors to this software, as they have also contributed to your software.

CLWP implements a simple cooperative user-space multi-threading library. I made it to be used together with the Allegro game library (but it is in no way dependent on it), to animate multiple independent objects (players, computer-controlled figures) in an easy programming model. Actually, I would appriciate this kind of functionality in Allegro itself.

CLWP main attraction is it's ease of use. There are only 3 functions that are vital (clwpInit, clwpSpawn, clwpYield), so even the multi-threading newbies can get started quickly.

CLWP is a C/C++ library intended for the GCC/Intel x86 platform. As a co-operative implementation, it circumvents the problems with pre-emptive multi-threading approaches, such as the fact the GNU standard libraries are not re-entrant. Also there are no hardware or BIOS dependencies (as timer interrupts), so I expect it to be portable over multiple x86 platforms (some assembly is used).

Tested on Compiled with Run on Operating System IDE
Windows MingW/GCC 2.95.3
MingW/GCC 3.2
Windows 95, ME, NT, 2000
Linux/Wine
Dev_C++
Comm.line
DOS DJGPP/GCC 2.95.3 DOS
Windows 95, ME
comm.line
Linux on i86 GCC 2.95.3
GCC 3.2.1
Linux comm.line

This library and its documentation is based on the pre-emptive multithreading LWP library as made by Josh Turpen in 1997 for the DJGPP platform. Josh, thanks a lot!

General remarks

- As this library implements a co-operative multithreading, threads must explicitely release control e.g. using clwpYield in order for other threads also to get CPU cycles.

- As the thread task switching is always done from the actual program flow (in contrast with a pre-emptive scheme where e.g. a timer interrupt can come at any moment), no provisions are needed for critical sections. Just don't give control away! However, semaphores are provided, so in case you have data which may not be accessed even though control is given up using clwpYield. This can be needed for example because otherwise the system would not react quickly enough due to the fact no other threads are serviced), using semaphores you still can safeguard your data structures.

Index

General remarks

Installation

Basic Functions
clwpInit( )
clwpSpawn( )
clwpYield( )
clwpKill( )
clwpExit( )
clwpThrow( )
clwpCatch( )

Auxilary routines
clwpDelay( )
clwpGetpid( )
clwpThreadCount( )
clwpThreadSuspend( )
clwpThreadResume( )
clwpGetThreadPriority( )
clwpAdjustThreadPriority( )

Debug and Configuration
clwpDump( )
clwpCheck( )
int clwpOperatingMode
clwpStackUsed( )

Semaphore Handling
clwpCreateSemaphore( )
clwpDeleteSemaphore( )
clwpGetSemaphoreCount( )
clwpAdjustSemaphoreCount()
clwpLockSemaphore( )
clwpReleaseSemaphore( )
clwpCreateMutex( )
clwpDeleteMutex( )
clwpLockMutex( )
clwpReleaseMutex( )

Example code

Warnings and known bugs

History

- Each thread needs some memory e.g. for the stack, which normally will be allocated automatically using malloc. However (in contrast with the original LWP library), if a thread is killed, the allocated memory is kept by clwp for later use by new tasks, as to prevent the memory pool to become fragmented, and to increase speed. So make and kill as many processes as you like, don't artificially keep them alive (however, stack size of killed task and requested task must be equal). The same is still to be implemented for semaphores (now just frees the memory).

- Each thread runs in its own machine state. However, in contrast with LWP, this state DOES NOT include the ds, es, fs, and gs registers, they are constant over the application (yes?), and as I got a segfault probably due to this. Functions that modify these registers (as apperently can be done in DJGPP, but which I never use within the MingW environment) must restore them before yielding in order to work properly. For example, the DJGPP extension _farsetsel(int selector) loads the fs register with a selector.

- See the very simple programming example as the end of this file, or the supplied example and test files.

- Overhead/performance: see example4.c: the same program running either via a procedure call or via a thread mechanism gives the following timing: (10.000.000 iterations): 22 versus 27 seconds to run. In this example, 10 million taks switch/procedure calls are performed, each time counting to 1000 within the procedure/task. So the average task switching overhead is about (27-22) / 2*10M = 0.25 microsecond (2 task switch per iteration; one to switch to the task, one to switch back to the main program loop). By the way, doing it all in the main program gives 21 seconds, so a procedure call is faster than a task switch, but I think 0.25 microseconds is for most applications more than good enough. Actually, I think it is even less, as my system is also doing other tasks in the mean time (we're talking about a 1 GHz Pentium-III Windows ME environment, 20% processor load when 'idle').

OPEN QUESTION: who does know which registers actually have to be saved at function boundaries? At this moment, I save all registers except for the segment registers, plus I save the FPU processor state. Is the latter needed? For specific platforms? Which registers do I not have to save, as they are made state-less by the compiler due to function call? As I am not a compiler/GCC expert, I just took the save way, but this might cost a lot of unneeded cycles.

- Things to do: Semaphore memory free list.
- Things done: see the history list at the end of this file


Installation

This depends on your compiler configuration. Extract clwp in its own directory for testing (using the make file supplied, adapt for your system), and next copy the clwp.h file to the include directory of your compiler. Make the related object files and link them to your program. Compilation can be done i.e. with:

gcc -Wall -c clwp.c -o clwp.o
gcc -Wall -c clwpasm.s -o clwpasm.o

Or, just compile clwp together with your program, like in:

gcc -o myprog.exe myprog.c clwp.c clwpasm.s

LINUX: first copy clwpasm_linux.s to clwpasm.s due to a difference in function naming syntax between my default GCC/MingW platform and the Linux GCC platform.


Basic Functions

int clwpInit(int priority)

This function initialises the multithreading engine. Once called, the main program begins executing as a thread itself.

The priority argument is the priority of the main( ) thread. The highest priority is one (in contrast with the original lwp library). A thread with priority 2 gets half as much CPU time as a thread with priority 1. A thread with priority 3 gets one-third times as much CPU time as a thread with priority 1, and so on. Threads are serviced in a round-robin fashion, skipping threads occasionally if they have low priority (i.e. priorities 1,2,3 are scheduled as 1,2,3;1;1,2;1,3;1,2;1;1,2,3;....).

clwpInit returns TRUE if it was successful, otherwise FALSE (with the error number retrievable with clwpCatch( )). Note: the process id of the main thread is always -1.

int clwpSpawn(void (*proc)(void *), void *param, int stackSize, int priority, int active)

This function creates new threads. New threads are inserted in front of the thread list, from which they will be executed in round-robin fashion (see clwpInit for some more details).

The first argument is the address of a function of type "void function(void *)". A thread terminates when its 'main' function returns.

The second argument is the parameter to the function, i.e. a pointer to a struct with the data for that particular instance of 'proc'. If your function doesn't use a parameter, simply pass NULL.

The third argument is the size of the stack for the thread. The stack size must be at least 1024 bytes, for sanity reasons. The amount of stack you need depends on the number of local variables your function uses. Remember that any functions that you call in your thread also use this stack, and recursive functions may use hefty amounts of stack. You can use clwpStackUsed to find the required size (but for a first run, take a librate amount!).

The forth argument is the priority of the thread. 1 is the highest priority. See clwpInit for a more extensive discussion on priorities.

The fifth argument determines whether the thread is active or not. Passing TRUE makes the thread active, while passing FALSE makes the thread suspended. If you make it suspended, you can start it later using clwpThreadResume(thread_id).

This function returns the process id of the thread, or 0 if it failed (with the error number retrievable with clwpCatch( ) ).

void clwpYield(void)

This functions causes the current thread to give up the CPU and jump to the next active scheduled thread according to priorities.

This function does check on stack overflow and damaged data structure, in case of an error it raises a clwpThrow directly to the main thread. The error cause can be retrieved using clwpCatch( ).

int clwpKill(int clwpid)

This function kills a thread. The argument "clwpid" is the process id of the thread to kill. The thread's memory is kep in a list, so new threads can reuse it quickly at least if stack size matches.

Warning: killing a thread does not return any memory allocated explicitely by the thread, like by using malloc yourself (or C++ constructors). Local data structures are recovered, as they are placed on the stack which is returned.

This function returns TRUE if the kill was successful, otherwise FALSE (with the error number retrievable with clwpCatch( ) ).

void clwpExit(void)

Explicit termination of CLWP. This function does not have to be called, as it is during initialisation registered as an at_exit routine in the C runtime. However, if you want to 'reset' your program this might be useful.

This function does not give a return value (as a at_exit function needs to be of type void), however any errors can be found via clwpCatch( ).

void clwpThrow(int errcode)

Sometimes, you have a fatal error situation on which you can not continue in the thread, but for which you need to go back to the main program to abort. For example, debuggers probably don't understand you working in some wild stack location. By using clwpThrow( ), control is transferred to the main thread, just after the clwpYield( ).

The error number can be picked up by clwpCatch( ) and further processed, normally aborting the program. This command is not intended for communication purposes to the main thread. Note: the errcode range 0-255 is reserved for clwp internal purposes.

int clwpCatch(void)

Can be used to retrieve the last error message. The returned int has packed data: the upper 16 bits contain the (lower 16 bits of the) thread causing the error, while the lower 16 bits contain the error number. Note: errors with number <16 are fatal errors, don't try to copntinue as if nothing happened. The errors >16 but <255 are CLWP errors like due to incorrect user input to CLWP or running out of memory.

To reset the error number to zero, do a clwpThrow(0).

Error Name (clwp.h) Code Description
  Fatal Errors will actually be clwpThrow( )'n when needed
CLWPerr_cantinstallatexit 1 (clwpInit) Couldn't install atexit routine
CLWPerr_listheadzero 2 (clwpKill) clwp list head is NULL
CLWPerr_stackoverflow 3 (clwp......) Stack damaged, probably overflowed
CLWPerr_null_ptr 4 (clwpCheck) Pointer to main task zero
CLWPerr_damaged 5 (clwp...) Internal data structure damaged
CLWPerr_circular 6 (clwpCheck) Circular list structure broken
CLWPerr_cantkill 7 (clwpKill) Did not succeed in clwpKill
  User Errors will just be set for retrieval via clwpCatch( )
CLWPerr_initcantmalloc 16 (clwpInit) Couldn't malloc clwp structure
CLWPerr_pidnotfound 17 (clwp......) PID not found
CLWPerr_stacktoosmall 18 (clwpSpawn) Attempt to spawn with a stack < 256 bytes
CLWPerr_spawncantmalloc 19 (clwpSpawn) Couldn't allocate memory
CLWPerr_spawncantstack 20 (clwpSpawn) Couldn't allocate stac
CLWPerr_priozero 21 (clwpSpawn) Attempted to spawn with a priority of zero
CLWPerr_cantkillmain 22 (clwpKill) Attempted to kill main( )
CLWPerr_cantkillzero 23 (clwpKill) Attempted to kill pid 0
CLWPerr_suspendmain 24 (clwpThreadSuspend) Tried to suspend MAIN
CLWPerr_semacantmalloc 25 (clwp....Semaphore) Couldn't allocate memory
CLWPerr_semaexists 26 (clwp....Semaphore) Semaphore already created
CLWPerr_semalistzero 27 (clwp....Semaphore) Semaphore List is NULL
CLWPerr_semanotexist 28 (clwp....Semaphore) Semaphore does not exist
CLWPerr_semanotowner 29 (clwp....Semaphore) Thread deleting a semaphore it didn't create
CLWPerr_semalocklimit 30 (clwpLockSemaphore) Lock limit exceeded
CLWPerr_clwpnoinit 31 (clwpExit) clwpInit not called
CLWPerr_clwpexit 31 (same as CLWPerr_clwpnoinit, for compatibility)
CLWPerr_stacksize 32 (clwpStackUsed) Depth analysis not enabled
CLWPerr_stackmain 33 (clwpStackUsed) can't analyse MAIN stack

int clwpLastError [ this is a global variable ]

In previous versions this used to hold the last error encountered. Still supported, but use is depriciated: replace by clwpCatch() & 0xFFFF.


Auxilary routines

void clwpDelay(int seconds)

This functions causes the current thread to delay the indicated amount of seconds. It does so by yielding until (at least) the correct amount of time is elapsed. Note: for long delays not needing ultimate precision, you might consider lowering the priority so less time is spend in checking elapsed time.

int clwpGetpid(void)

This function returns the current threads process id if successful, 0 (with the error number retrievable with clwpCatch( ) ) otherwise.

int clwpThreadCount(void)

This function returns the number of threads that is running and/or suspended.

This function always succeeds.

int clwpThreadSuspend(int clwpid)

This function suspends a thread's execution, causing it to sleep until it has been resumed with "clwpThreadResume". A sleeping thread receives no CPU time. The argument clwpid is the process id of the thread to suspend.

This function returns TRUE if successful, FALSE (with the error number retrievable with clwpCatch()) otherwise.

int clwpThreadResume(int clwpid)

This function resumes a thread's execution, causing it to 'wake up' after being suspended with clwpThreadSuspend. The argument clwpid is the process id of the thread to suspend.

This function returns TRUE if successful, FALSE (with the error number retrievable with clwpCatch( ) ) otherwise.

int clwpGetThreadPriority(int clwpid)

Get the priority of the current thread. For a discussion on thread priority, see the clwpInit( ) description.

This function returns the thread's priority if successful, 0 (with the error number retrievable with clwpCatch( ) ) otherwise.

int clwpAdjustThreadPriority(int clwpid, int priority)

Sets the priority of the current thread. For a discussion on thread priority, see the clwpInit( ) description.

This function returns the thread's previous priority if successful, 0 (with the error number retrievable with clwpCatch( ) ) otherwise.


Debug and Configuration

void clwpDump()

Only included when compiled with -DDIAG (otherwise empty body).

This function dumps some internal information: first the list of freed thread blocks is listed (indicating previous id number and stack info), next the currently existing threads are listed (some key elements plus the stack frame). Only important for debugging purposes.

int clwpCheck()

Perform consistency check, assert error if internal state damaged. Don't expect this to return in a thread other than main, as it will clwpThrow an exception. However, the result of clwpCatch( ) is also returned (if called from the main thread), or 0 if everything checks OK.

int clwpOperatingMode [ global variable ]

Global variable containing bits indicating internal operating mode, used especially for setting modes for debugging and optimalisation. Preferably set before initialisation by clwpInit( ). Currently defined bits:

CLWPMODE_STACK Enable stack depth measurement (see clwpStackUsed). Imposes some cycle overhead in thread creation.

int clwpStackUsed(int clwpid)

Returns the maximum amount of stack used (in bytes) by the indicated thread so far. Can be used to optimise the required stack space. Note this can only be used if clwpOperatingMode has the CLWPMODE_STACKSIZE bit set. Note: this function takes a number of cycles proportional to the stack size. Note 2: the main thread does not have this possibility (returns 0).

Returns 0 if an error occured (with error number retrievable with clwpCatch( ) ). Returns 0 silently if CLWPMODE_STACKSIZE not set.


Sepahore Handling

clwpCreateSemaphore(void *lockaddr, int count)

This function creates a semaphore for the variable or function passed in by 'lockaddr'. The parameter 'count' is the number of threads that are allowed access to the function/variable at the same time.

This function returns TRUE when successful, FALSE (with the error number retrievable with clwpCatch( ) ) otherwise.

int clwpDeleteSemaphore(void *lockaddr)

This function deletes the semaphore specified by 'lockaddr'. Only the thread that created the semaphore may delete it. If the thread that created the semaphore has terminated, then another thread may delete it.

This function returns TRUE when successful, otherwise FALSE (with the error number retrievable with clwpCatch( ) ).

int clwpGetSemaphoreCount(void *lockaddr)

This function returns the number of threads allowed to access a semaphore specified by lockaddr, or 0 (with the error number retrievable with clwpCatch( ) ) if the semaphore does not exist.

Note: Calling clwpGetSemaphoreCount on an address that has been mutex'd returns 1 (One thread allowed to access semaphore).

int clwpAdjustSemaphoreCount(void *lockaddr, int count)

This function is used to change the count value for a semaphore. This function returns TRUE when successful, otherwise FALSE (with cthe error number retrievable with clwpCatch( ) ).

Calling clwpAdjustSemaphoreCount with a value other than 1 for count on a mutex'd address changes the mutex into a semaphore.

int clwpLockSemaphore(void *lockaddr)

This function assumes control over the semaphore so that your thread may access it. This function blocks, i.e. it will not return until it gains control of the semaphore. Even though this function blocks, it safely (hopefully) avoids deadlocks.

This function returns TRUE when successful, otherwise FALSE (with the error number retrievable with clwpCatch( ) ).

int clwpReleaseSemaphore(void *lockaddr)

This function releases control of the semaphore so that other threads may access it.

This function returns TRUE when successful, otherwise FALSE (with the error number retrievable with clwpCatch( ) ).

int clwpCreateMutex(void *lockaddr)

Same as clwpCreateSemaphore(lockaddr, 1)

int clwpDeleteMutex(void *lockaddr)

Same as clwpDeleteSemaphore(lockaddr)

int clwpLockMutex(void *lockaddr)

Same as clwpLockSemaphore(lockaddr)

int clwpReleaseMutex(void *lockaddr)

Same as clwpReleaseSemaphore(lockaddr)

According to Josh Turpen, the semaphore/mutex code is very alpha, and probably not the most stable code in the world. Actually I'm not even sure his idea of a semaphore/mutex is what the rest of the world agrees with.


Example code

#include "clwp.h"
#include <stdio.h>

#define MAX_PROC 3

void proc(void *param)         /* this proc will be started as thread */
{   while(1)                   /* multiple times in parallel */
    {   printf("PROC %d\n", *((int *) param) );    
        clwpYield();           /* switch to next thread */
    }
}

int main()
{
    volatile int i;
    int ids[MAX_PROC];         /* to store thread IDs */

    int a=1,b=2,c=3;           /* 'data' for each thread */

    printf("\nCLWP Example.\n");
    printf("This program spawns 3 threads that each print messages.\n");
    fflush(stdout);

    if(clwpInit(1))            /* initialise the threading system */
    {                          /* and create 3 extra threads */
        ids[0] = clwpSpawn(proc, &a, 4096, 1, TRUE);
        ids[1] = clwpSpawn(proc, &b, 4096, 1, TRUE);
        ids[2] = clwpSpawn(proc, &c, 4096, 1, TRUE);

        for(i=0;i<5;i++)       /* go looping, activating all threads */
        {
            printf("MAIN\n"); clwpYield();
        }

        clwpKill(ids[0]);      /* OK, work is done, let's go home */
        clwpKill(ids[1]);
        clwpKill(ids[2]);
    }
    printf("Ready\n");
    return(0);                 /* clwpExit call not needed (auto) */
}

/* end of file */

Warnings and known bugs

Compile with -DDIAG to include diagnostic warnings to stdout

Update make_clwp with correct path etc for your compiler!

History

Version 1.0: June 26, 2002 C.M. Moerman
- changed to co-operative, added clwpDelay, free list, diagnostics routines, test suite

Version 1.1: July 3, 2002 C.M. Moerman
- error messages enhanced, clwpLastError, clwpExit,
- bug in stack handling in lwpKill (free stack???),
- documentation update, readme.txt

Version 1.2: December 2002, C.M. Moerman (c)
- feedback by Alexander Mironenko on error in benchmarks, updated example4.

Version 1.3: Januari 2003, C.M. Moerman (c)
- Added clwpThrow( ) and clwpCatch( )
- clwpLastError is now depreciated.
- More extensive security stacks (overflow/damaged structure)
- All references to assert() removed
- Error messages renumbered, <16 are fatal errors
- Bug of original LWP in clwpDeleteSemaphore solved
- HTML manual added
- clwpGetThreadPriority, clwpAdjustThreadPriority ifx changed


OK, have fun with it!

eMail address: Kees Moerman
Web address: my home page