Kees Moerman, Januari 2003 CLWP: Co-operative Light-Weight Processes (actually, threads)Disclaimer: THIS SOFTWARE IS PROVIDED "AS IS" WITHOUT WARRANTY, IMPLIED OR OTHERWISE. I make no promises that this software will even do what it is intended to do or claims to do. USE AT YOUR OWN RISK. This software is provided to the public domain. The only restriction is that anything that is created using this software has to include a list of names (in the included "thanks" file) crediting the contributors to this software, as they have also contributed to your software. CLWP implements a simple cooperative user-space multi-threading library. I made it to be used together with the Allegro game library (but it is in no way dependent on it), to animate multiple independent objects (players, computer-controlled figures) in an easy programming model. Actually, I would appriciate this kind of functionality in Allegro itself. CLWP main attraction is it's ease of use. There are only 3 functions that are vital (clwpInit, clwpSpawn, clwpYield), so even the multi-threading newbies can get started quickly. CLWP is a C/C++ library intended for the GCC/Intel x86 platform. As a co-operative implementation, it circumvents the problems with pre-emptive multi-threading approaches, such as the fact the GNU standard libraries are not re-entrant. Also there are no hardware or BIOS dependencies (as timer interrupts), so I expect it to be portable over multiple x86 platforms (some assembly is used).
This library and its documentation is based on the pre-emptive multithreading LWP library as made by Josh Turpen in 1997 for the DJGPP platform. Josh, thanks a lot! General remarks- As this library implements a co-operative multithreading, threads must explicitely release control e.g. using clwpYield in order for other threads also to get CPU cycles. - As the thread task switching is always done from the actual program flow (in contrast with a pre-emptive scheme where e.g. a timer interrupt can come at any moment), no provisions are needed for critical sections. Just don't give control away! However, semaphores are provided, so in case you have data which may not be accessed even though control is given up using clwpYield. This can be needed for example because otherwise the system would not react quickly enough due to the fact no other threads are serviced), using semaphores you still can safeguard your data structures. |
IndexBasic
Functions Auxilary
routines Debug and
Configuration Semaphore
Handling |
- Each thread needs some memory e.g. for the stack, which normally will be allocated automatically using malloc. However (in contrast with the original LWP library), if a thread is killed, the allocated memory is kept by clwp for later use by new tasks, as to prevent the memory pool to become fragmented, and to increase speed. So make and kill as many processes as you like, don't artificially keep them alive (however, stack size of killed task and requested task must be equal). The same is still to be implemented for semaphores (now just frees the memory).
- Each thread runs in its own machine state. However, in contrast with LWP, this state DOES NOT include the ds, es, fs, and gs registers, they are constant over the application (yes?), and as I got a segfault probably due to this. Functions that modify these registers (as apperently can be done in DJGPP, but which I never use within the MingW environment) must restore them before yielding in order to work properly. For example, the DJGPP extension _farsetsel(int selector) loads the fs register with a selector.
- See the very simple programming example as the end of this file, or the supplied example and test files.
- Overhead/performance: see example4.c
:
the same program running either via a procedure call or via a
thread mechanism gives the following timing: (10.000.000
iterations): 22 versus 27 seconds to run. In this example, 10
million taks switch/procedure calls are performed, each time
counting to 1000 within the procedure/task. So the average task
switching overhead is about (27-22) / 2*10M = 0.25
microsecond (2 task switch per iteration; one to switch
to the task, one to switch back to the main program loop). By the
way, doing it all in the main program gives 21 seconds, so a
procedure call is faster than a task switch, but I think 0.25
microseconds is for most applications more than good enough.
Actually, I think it is even less, as my system is also doing
other tasks in the mean time (we're talking about a 1 GHz Pentium-III
Windows ME environment, 20% processor load when 'idle').
OPEN QUESTION: who does know which registers actually have to be saved at function boundaries? At this moment, I save all registers except for the segment registers, plus I save the FPU processor state. Is the latter needed? For specific platforms? Which registers do I not have to save, as they are made state-less by the compiler due to function call? As I am not a compiler/GCC expert, I just took the save way, but this might cost a lot of unneeded cycles.
- Things to do: Semaphore memory free list.
- Things done: see the history list at the end
of this file
This depends on your compiler configuration. Extract clwp in its own directory for testing (using the make file supplied, adapt for your system), and next copy the clwp.h file to the include directory of your compiler. Make the related object files and link them to your program. Compilation can be done i.e. with:
gcc -Wall -c clwp.c -o clwp.o
gcc -Wall -c clwpasm.s -o clwpasm.o
Or, just compile clwp together with your program, like in:
gcc -o myprog.exe myprog.c clwp.c clwpasm.s
LINUX: first copy clwpasm_linux.s
to clwpasm.s
due to a difference in
function naming syntax between my default GCC/MingW platform and
the Linux GCC platform.
This function initialises the multithreading engine. Once called, the main program begins executing as a thread itself.
The priority argument is the priority of the main( ) thread.
The highest priority is one (in contrast with the original lwp
library). A thread with priority 2 gets half as much CPU time as
a thread with priority 1. A thread with priority 3 gets one-third
times as much CPU time as a thread with priority 1, and so on.
Threads are serviced in a round-robin fashion, skipping threads
occasionally if they have low priority (i.e. priorities 1,2,3 are
scheduled as 1
,
2
,
3
;
1
;
1
,
2
;
1
,
3
;
1
,
2
;
1
;
1
,
2
,
3
;....
).
clwpInit returns TRUE if it was successful, otherwise FALSE (with the error number retrievable with clwpCatch( )). Note: the process id of the main thread is always -1.
This function creates new threads. New threads are inserted in front of the thread list, from which they will be executed in round-robin fashion (see clwpInit for some more details).
The first argument is the address of a function of type "void function(void *)". A thread terminates when its 'main' function returns.
The second argument is the parameter to the function, i.e. a pointer to a struct with the data for that particular instance of 'proc'. If your function doesn't use a parameter, simply pass NULL.
The third argument is the size of the stack for the thread. The stack size must be at least 1024 bytes, for sanity reasons. The amount of stack you need depends on the number of local variables your function uses. Remember that any functions that you call in your thread also use this stack, and recursive functions may use hefty amounts of stack. You can use clwpStackUsed to find the required size (but for a first run, take a librate amount!).
The forth argument is the priority of the thread. 1 is the highest priority. See clwpInit for a more extensive discussion on priorities.
The fifth argument determines whether the thread is active or not. Passing TRUE makes the thread active, while passing FALSE makes the thread suspended. If you make it suspended, you can start it later using clwpThreadResume(thread_id).
This function returns the process id of the thread, or 0 if it failed (with the error number retrievable with clwpCatch( ) ).
This functions causes the current thread to give up the CPU and jump to the next active scheduled thread according to priorities.
This function does check on stack overflow and damaged data structure, in case of an error it raises a clwpThrow directly to the main thread. The error cause can be retrieved using clwpCatch( ).
This function kills a thread. The argument "clwpid" is the process id of the thread to kill. The thread's memory is kep in a list, so new threads can reuse it quickly at least if stack size matches.
Warning: killing a thread does not return any memory allocated explicitely by the thread, like by using malloc yourself (or C++ constructors). Local data structures are recovered, as they are placed on the stack which is returned.
This function returns TRUE if the kill was successful, otherwise FALSE (with the error number retrievable with clwpCatch( ) ).
Explicit termination of CLWP. This function does not have to be called, as it is during initialisation registered as an at_exit routine in the C runtime. However, if you want to 'reset' your program this might be useful.
This function does not give a return value (as a at_exit function needs to be of type void), however any errors can be found via clwpCatch( ).
Sometimes, you have a fatal error situation on which you can not continue in the thread, but for which you need to go back to the main program to abort. For example, debuggers probably don't understand you working in some wild stack location. By using clwpThrow( ), control is transferred to the main thread, just after the clwpYield( ).
The error number can be picked up by clwpCatch( ) and further processed, normally aborting the program. This command is not intended for communication purposes to the main thread. Note: the errcode range 0-255 is reserved for clwp internal purposes.
Can be used to retrieve the last error message. The returned int has packed data: the upper 16 bits contain the (lower 16 bits of the) thread causing the error, while the lower 16 bits contain the error number. Note: errors with number <16 are fatal errors, don't try to copntinue as if nothing happened. The errors >16 but <255 are CLWP errors like due to incorrect user input to CLWP or running out of memory.
To reset the error number to zero, do a clwpThrow(0).
Error Name (clwp.h) | Code | Description |
Fatal Errors | will actually be clwpThrow( )'n when needed | |
CLWPerr_cantinstallatexit | 1 | (clwpInit) Couldn't install atexit routine |
CLWPerr_listheadzero | 2 | (clwpKill) clwp list head is NULL |
CLWPerr_stackoverflow | 3 | (clwp......) Stack damaged, probably overflowed |
CLWPerr_null_ptr | 4 | (clwpCheck) Pointer to main task zero |
CLWPerr_damaged | 5 | (clwp...) Internal data structure damaged |
CLWPerr_circular | 6 | (clwpCheck) Circular list structure broken |
CLWPerr_cantkill | 7 | (clwpKill) Did not succeed in clwpKill |
User Errors | will just be set for retrieval via clwpCatch( ) | |
CLWPerr_initcantmalloc | 16 | (clwpInit) Couldn't malloc clwp structure |
CLWPerr_pidnotfound | 17 | (clwp......) PID not found |
CLWPerr_stacktoosmall | 18 | (clwpSpawn) Attempt to spawn with a stack < 256 bytes |
CLWPerr_spawncantmalloc | 19 | (clwpSpawn) Couldn't allocate memory |
CLWPerr_spawncantstack | 20 | (clwpSpawn) Couldn't allocate stac |
CLWPerr_priozero | 21 | (clwpSpawn) Attempted to spawn with a priority of zero |
CLWPerr_cantkillmain | 22 | (clwpKill) Attempted to kill main( ) |
CLWPerr_cantkillzero | 23 | (clwpKill) Attempted to kill pid 0 |
CLWPerr_suspendmain | 24 | (clwpThreadSuspend) Tried to suspend MAIN |
CLWPerr_semacantmalloc | 25 | (clwp....Semaphore) Couldn't allocate memory |
CLWPerr_semaexists | 26 | (clwp....Semaphore) Semaphore already created |
CLWPerr_semalistzero | 27 | (clwp....Semaphore) Semaphore List is NULL |
CLWPerr_semanotexist | 28 | (clwp....Semaphore) Semaphore does not exist |
CLWPerr_semanotowner | 29 | (clwp....Semaphore) Thread deleting a semaphore it didn't create |
CLWPerr_semalocklimit | 30 | (clwpLockSemaphore) Lock limit exceeded |
CLWPerr_clwpnoinit | 31 | (clwpExit) clwpInit not called |
CLWPerr_clwpexit | 31 | (same as CLWPerr_clwpnoinit, for compatibility) |
CLWPerr_stacksize | 32 | (clwpStackUsed) Depth analysis not enabled |
CLWPerr_stackmain | 33 | (clwpStackUsed) can't analyse MAIN stack |
In previous versions this used to hold the last error
encountered. Still supported, but use is depriciated: replace by clwpCatch()
& 0xFFFF
.
This functions causes the current thread to delay the indicated amount of seconds. It does so by yielding until (at least) the correct amount of time is elapsed. Note: for long delays not needing ultimate precision, you might consider lowering the priority so less time is spend in checking elapsed time.
This function returns the current threads process id if successful, 0 (with the error number retrievable with clwpCatch( ) ) otherwise.
This function returns the number of threads that is running and/or suspended.
This function always succeeds.
This function suspends a thread's execution, causing it to sleep until it has been resumed with "clwpThreadResume". A sleeping thread receives no CPU time. The argument clwpid is the process id of the thread to suspend.
This function returns TRUE if successful, FALSE (with the error number retrievable with clwpCatch()) otherwise.
This function resumes a thread's execution, causing it to 'wake up' after being suspended with clwpThreadSuspend. The argument clwpid is the process id of the thread to suspend.
This function returns TRUE if successful, FALSE (with the error number retrievable with clwpCatch( ) ) otherwise.
Get the priority of the current thread. For a discussion on thread priority, see the clwpInit( ) description.
This function returns the thread's priority if successful, 0 (with the error number retrievable with clwpCatch( ) ) otherwise.
Sets the priority of the current thread. For a discussion on thread priority, see the clwpInit( ) description.
This function returns the thread's previous priority if successful, 0 (with the error number retrievable with clwpCatch( ) ) otherwise.
Only included when compiled with -DDIAG (otherwise empty body).
This function dumps some internal information: first the list of freed thread blocks is listed (indicating previous id number and stack info), next the currently existing threads are listed (some key elements plus the stack frame). Only important for debugging purposes.
Perform consistency check, assert error if internal state damaged. Don't expect this to return in a thread other than main, as it will clwpThrow an exception. However, the result of clwpCatch( ) is also returned (if called from the main thread), or 0 if everything checks OK.
Global variable containing bits indicating internal operating mode, used especially for setting modes for debugging and optimalisation. Preferably set before initialisation by clwpInit( ). Currently defined bits:
CLWPMODE_STACK Enable stack depth measurement (see clwpStackUsed). Imposes some cycle overhead in thread creation.
Returns the maximum amount of stack used (in bytes) by the indicated thread so far. Can be used to optimise the required stack space. Note this can only be used if clwpOperatingMode has the CLWPMODE_STACKSIZE bit set. Note: this function takes a number of cycles proportional to the stack size. Note 2: the main thread does not have this possibility (returns 0).
Returns 0 if an error occured (with error number retrievable with clwpCatch( ) ). Returns 0 silently if CLWPMODE_STACKSIZE not set.
This function creates a semaphore for the variable or function passed in by 'lockaddr'. The parameter 'count' is the number of threads that are allowed access to the function/variable at the same time.
This function returns TRUE when successful, FALSE (with the error number retrievable with clwpCatch( ) ) otherwise.
This function deletes the semaphore specified by 'lockaddr'. Only the thread that created the semaphore may delete it. If the thread that created the semaphore has terminated, then another thread may delete it.
This function returns TRUE when successful, otherwise FALSE (with the error number retrievable with clwpCatch( ) ).
This function returns the number of threads allowed to access a semaphore specified by lockaddr, or 0 (with the error number retrievable with clwpCatch( ) ) if the semaphore does not exist.
Note: Calling clwpGetSemaphoreCount on an address that has been mutex'd returns 1 (One thread allowed to access semaphore).
This function is used to change the count value for a semaphore. This function returns TRUE when successful, otherwise FALSE (with cthe error number retrievable with clwpCatch( ) ).
Calling clwpAdjustSemaphoreCount with a value other than 1 for count on a mutex'd address changes the mutex into a semaphore.
This function assumes control over the semaphore so that your thread may access it. This function blocks, i.e. it will not return until it gains control of the semaphore. Even though this function blocks, it safely (hopefully) avoids deadlocks.
This function returns TRUE when successful, otherwise FALSE (with the error number retrievable with clwpCatch( ) ).
This function releases control of the semaphore so that other threads may access it.
This function returns TRUE when successful, otherwise FALSE (with the error number retrievable with clwpCatch( ) ).
Same as clwpCreateSemaphore(lockaddr, 1)
Same as clwpDeleteSemaphore(lockaddr)
Same as clwpLockSemaphore(lockaddr)
Same as clwpReleaseSemaphore(lockaddr)
According to Josh Turpen, the semaphore/mutex code is very alpha, and probably not the most stable code in the world. Actually I'm not even sure his idea of a semaphore/mutex is what the rest of the world agrees with.
#include "clwp.h" #include <stdio.h> #define MAX_PROC 3
void
proc(
void
*param)
/* this proc will be started as thread */
{
while
(
1
)
/* multiple times in parallel */
{ printf(
"PROC %d\n"
, *((
int
*) param) );
clwpYield
();
/* switch to next thread */
} }
int
main() {
volatile int
i;
int
ids[MAX_PROC];
/* to store thread IDs */
int
a=
1
,b=
2
,c=
3
;
/* 'data' for each thread */
printf(
"\nCLWP Example.\n"
); printf(
"This program spawns 3 threads that each print messages.\n"
); fflush(stdout);
if
(
clwpInit
(
1
))
/* initialise the threading system */
{
/* and create 3 extra threads */
ids[
0
] =
clwpSpawn
(proc, &a,
4096
,
1
, TRUE); ids[
1
] =
clwpSpawn
(proc, &b,
4096
,
1
, TRUE); ids[
2
] =
clwpSpawn
(proc, &c,
4096
,
1
, TRUE);
for
(i=
0
;i<
5
;i++)
/* go looping, activating all threads */
{ printf(
"MAIN\n"
);
clwpYield
(); }
clwpKill
(ids[
0
]);
/* OK, work is done, let's go home */
clwpKill
(ids[
1
]);
clwpKill
(ids[
2
]); } printf(
"Ready\n"
);
return
(
0
);
/* clwpExit call not needed (auto) */
}
/* end of file */
Compile with -DDIAG to include diagnostic warnings to stdout
Update make_clwp
with correct path etc for your
compiler!
Version 1.0: June 26, 2002 C.M. Moerman
- changed to co-operative, added clwpDelay, free list,
diagnostics routines, test suite
Version 1.1: July 3, 2002 C.M. Moerman
- error messages enhanced, clwpLastError, clwpExit,
- bug in stack handling in lwpKill (free stack???),
- documentation update, readme.txt
Version 1.2: December 2002, C.M. Moerman (c)
- feedback by Alexander Mironenko on error in benchmarks, updated
example4.
Version 1.3: Januari 2003, C.M. Moerman (c)
- Added clwpThrow( ) and clwpCatch( )
- clwpLastError is now depreciated.
- More extensive security stacks (overflow/damaged structure)
- All references to assert() removed
- Error messages renumbered, <16 are fatal errors
- Bug of original LWP in clwpDeleteSemaphore
solved
- HTML manual added
- clwpGetThreadPriority, clwpAdjustThreadPriority ifx
changed
OK, have fun with it!
eMail address: Kees Moerman
Web address: my
home page