There are 14 queues and one thread state for each queue. The queues are
shown below with an example of what the queue looks like. The states are
listed later. After the list of states, the event flag is described and
examples of what is done to put a thread on a queue and take it off.
Thread scheduling is basically just moving threads among these queues.
When a thread has to block to wait for something, it is put on the
queue for that type of event. When the thread is not waiting for an event,
but is just waiting to run, it is on the pthread_current_prio_queue. From
this queue, it will become a running thread in the order of it's priority.
Otherwise, it will be waiting around on one of the other queues. A thread
will run until it blocks for something, or until the timer goes off and
reschedules it in the priority queue to be run again.
QUEUES and what they look like
-------------------------------
Threads are put on the queues in the order thread1, thread2, thread3, etc.
The examples show what the queues look like after the last thread is
put on the queue. The queue heads look like this:
struct pthread_queue {
struct pthread *q_next;
struct pthread *q_last;
void *q_data;
};
Queue functions are:
--------------------
pthread_queue_init(queue);
- queue = {NULL, NULL, NULL}
pthread_queue_enq(queue, thread);
- puts thread on queue
pthread_queue_get();
- returns pointer to next thread on queue, thread
stays on the queue
pthread_queue_deq(queue);
- returns pointer to the next thread on the queue
and removes the thread from the queue
pthread_queue_remove(queue, thread);
- removes the specified thread from the queue
in the pthread_t structure, there are 5 queue related fields:
-------------------------------------------------
struct pthread_queue join_queue
- queue of threads that are waiting to join with this one
- the queue looks like this:
thread.join_queue->q_next = thread1;
thread.join_queue->q_last = thread2;
thread1->queue = &thread.join_queue;
thread1->next = thread2;
thread2->queue = &thread.join_queue;
thread2->next = NULL;
struct pthread_queue *queue
- pointer to the head of the queue that the thread is on
- used for dead, alloc, join, r_queue, w_queue, c_queue, m_queue
- queue head is described below
struct pthread *next
- next thread in queue
- used for dead, alloc, join, prio, r_queue, w_queue, c_queue, m_queue
struct pthread *pll
- used for the pthread linked list, has all threads in any state
struct pthread *sll
- used for the linked list of sleeping threads
in globals.c:
-----------------------------------
struct pthread *pthread_link_list - head of list of all threads
- looks like this:
pthread_link_list = thread3;
thread3->pll = thread2;
thread2->pll = thread1;
thread1->pll = NULL;
struct pthread_queue pthread_dead_queue
- queue of dead threads
struct pthread_queue pthread_alloc_queue
- queue of available, allocated threads
- looks like this:
queue->q_next = thread1;
queue->q_last = thread2;
thread1->queue = queue;
thread1->next = thread2;
thread2->queue = queue;
thread2->next = NULL;
struct pthread_prio_queue pthread_current_prio_queue
- current priority queue
- thread is in state PS_RUNNING
- looks like this:
pp is priority level > p
prio_queue->level[pp].first = thread1;
prio_queue->level[pp].last = thread2;
prio_queue->next = thread1;
thread1->next = thread2;
thread2->next = thread3;
thread3->next = NULL;
prio_queue->level[p].first = thread3;
prio_queue->level[p].last = thread3;
in sleep.c:
------------------
struct pthread *pthread_sleep
- list of sleeping threads
- looks like this:
pthread_sleep = thread1;
thread1->sll = thread2;
thread2->sll = thread3;
thread3->sll = NULL;
in sig.c:
----------------
pthread_sigwait linked list of threads waiting for a signal
- looks like this:
pthread_sigwait = thread2;
thread2->next = thread1;
thread1->next = NULL;
in fd.c:
--------
each fd_table entry has:
r_queue list of threads waiting for a read lock on this fd
w_queue list of threads waiting for a write lock on this fd
in fd_kern.c:
-------------
fd_wait_read list of threads waiting for kernel read
fd_wait_write list of threads waiting for kernel write
fd_wait_select list of threads waiting for kernel select
in wait.c:
----------
wait_queue
- is used for each of these queues:
fd.r_queue fd.w_queue fd_wait_read fd_wait_write
fd_wait_select wait_queue
- looks like this:
queue->q_next = thread1;
queue->q_last = thread2;
thread1->queue = queue;
thread1->next = thread2;
thread2->queue = queue;
thread2->next = NULL;
each conditional has
--------------------
cond.c_queue
- list of threads waiting on a conditional
- looks like this:
cond.c_queue->q_next = thread1;
cond.c_queue->q_last = thread2;
thread1->queue = cond.c_queue;
thread1->next = thread2;
thread2->queue = cond.c_queue;
thread2->next = NULL;
each mutex has
--------------
mutex.m_queue
- list of threads waiting on a mutex
- looks like this:
mutex.m_queue->q_next = thread1;
mutex.m_queue->q_last = thread2;
thread1->queue = mutex.m_queue;
thread1->next = thread2;
thread2->queue = mutex.m_queue;
thread2->next = NULL;
THREAD STATES (from state.def) and the corresponding queue:
-----------------------------------------------------------
Queue State Description
Functions used in
pthread_current_prio_queue PS_RUNNING running
mutex.m_queue PS_MUTEX_WAIT waiting for a mutex
pthread_mutex_lock()
cond.c_queue PS_COND_WAIT waiting on a condition variable
pthread_cond_wait()
pthread_cond_timedwait()
r_queue PS_FDLR_WAIT waiting on a fd read lock
fd_lock()
w_queue PS_FDLW_WAIT waiting on a fd write lock
fd_lock()
fd_wait_read PS_FDR_WAIT waiting on a kernel fd to have data to read
read() readv() recv()
recvfrom() recvmsg()
fd_wait_write PS_FDW_WAIT waiting on a kernel fd to write data
write() writev()
send() sendto() sendmsg()
fd_wait_select PS_SELECT_WAIT waiting for several fds in a select
select()
pthread_sleep PS_SLEEP_WAIT waiting on a sleep
sleep() usleep() nanosleep()
wait_queue PS_WAIT_WAIT waiting for a child to die
wait() waitpid() [wait3() wait4()]
pthread_sigwait PS_SIGWAIT waiting on a set of signals
sigwait()
join_queue PS_JOIN waiting for a thread to die
pthread_join()
pthread_dead_queue PS_DEAD waiting for a thread to join with it or detach it
pthread_alloc_queue PS_UNALLOCED available to use for a new thread
EVENT FLAGS
-----------
Event flags are used to monitor the event state of a thread. A thread is
either waiting for an event, or the event is done. The flags are mutually
exclusive - they cannot both be set at the same time. Macros are defined
to manipulate and test the flags.
flags (in pthread.h):
---------------------
PF_WAIT_EVENT 0x01
PF_DONE_EVENT 0x02
macros (in pthread.h):
----------------------
SET_PF_DONE_EVENT(thread)
This wants to set the flag to indicate an event is done.
If the flag is already set to PF_DONE_EVENT, then the flag
cannot be set now. If the flag is set to PF_WAIT_EVENT, then
it's OK to set the flag to PF_DONE_EVENT. If the flag is
not set to PF_DONE_EVENT or PF_WAIT_EVENT, then PANIC
if the PF_DONE_EVENT flag is not set
then
if the PF_WAIT_EVENT flag is set
then set flags to PF_DONE_EVENT
return OK
else PANIC
else return NOTOK
SET_PF_WAIT_EVENT(thread)
This sets the flag to indicate it's waiting for an event.
Set the flag only if it is clear, otherwise PANIC
if flags is set to PF_WAIT_EVENT or PF_DONE_EVENT
then PANIC
else set flags to PF_WAIT_EVENT
CLEAR_PF_DONE_EVENT(pthread)
Clear the flag if it is set to PF_DONE_EVENT.
if flags is not set to PF_DONE_EVENT
then PANIC
else set flags to 0
PUTTING A THREAD ON A QUEUE:
----------------------------
This is a general sequence of steps to put a thread on a queue. There
are variations, but this covers what is basically needed.
pthread_sched_prevent(); - locks kernel
SET_PF_WAIT_EVENT(pthread_run);
pthread_queue_enq(queue, pthread_run);
pthread_resched_resume(state); - sets state, processes signals, changes context,
processes signals, unlocks kernel
if event has occurred successfully CLEAR_PF_DONE_EVENT(pthread_run);
else pthread_resched_resume(state) - reschedule the thread to keep waiting
REMOVING A THREAD FROM A QUEUE:
-------------------------------
This is the general sequence of steps to take a thread off of a queue. There
are variations, but this covers what is basically needed.
pthread = pthread_queue_deq(queue); - take next thread off of queue
pthread_sched_prevent();
if SET_PF_DONE_EVENT(pthread) - check if the event it was waiting for is done
pthread_sched_other_resume(pthread); - schedule thread
else pthread_sched_resume(); - continue with current thread
DESCRIPTION OF FUNCTIONS
------------------------
This section describes very tersely what some of the key functions
do. This is what helped me to understand what is going on. It may or
may not help you much.
pthread.c
pthread_create
create thread structure (or take it from alloc queue)
initialize it
pthread_sched_prevent
pthread_sched_other_resume(new_thread)
signal.c
pthread_sched_other_resume(thread)
set state to PS_RUNNING
put on prio_queue
if thread has higher priority than pthread_run
and the kernel is locked
sig_handler(SIGVTALRM)
- this eventually causes context switch
pthread_sched_resume()
- process any signals and continue running
the current thread
sig_handler(sig)
(kernel lock must = 1)
sig_handler_switch(sig)
process any other signals
sig_handler_switch(sig)
VTALRM: sigvtalrm();
sigvtalrm()
if (sig_count) set sig_count = 0, unblock all signals.
context_switch()
context_switch_done()
pthread_sched_resume()
loop until kernel is not locked
if there are any signals to process,
keep the kernel locked
call the signal handler
if pthread_run has a signal to process
keep the kernel locked
call pthread_sig_process
pthread_resched_resume(state)
changes state of pthread_run to state
sig_handler(SIGVTALRM)
pthread_sched_resume()
signal.c
context_switch
if state is PS_RUNNING put current thread on prio_queue
save state info
poll all fds
A: get next thread off of prio_queue
restore that thread's state
return
if no threads on prio_queue, see if there are
any threads in the linked list
process any interrupts for the thread
if no signals to process, wait
then process the signal and go back to A:
exit if there are no threads alive
context_switch_done()
clear SIGVTALRM
set_thread_timer()
set_thread_timer()
setitimer(pthread_run)
based on scheduling policy
fd_lock
lock fd's mutex with timedwait
return error if it times out
if the fd has an owner and it is not pthread_run
then stop scheduling
put pthread_run on the fd's r or w queue
SET_PF_WAIT_EVENT on pthread_run
unlock mutex
if timeout is 0
then set pthread_run->data
resched_resume(PS_FDLR_WAIT)
lock mutex
else
stuff
CLEAR_PF_DONE_EVENT on pthread_run
else increase lockcount or set owner to pthread_run
unlock fd's mutex
fd_unlock
lock fd's mutex
if pthread_run owns fd for reading/writing and is not waiting
with another lock
then get the next thread from the read queue,
stop scheduling
set the owner for the lock to the thread
if SET_PF_DONE_EVENT on thread is successful
schedule the thread
else resume the current thread
else decrease lockcount
unlock fd's mutex
QUESTIONS
---------
i think a thread can sometimes be on two queues, like the sleep
queue and another one.
CANCELLATION
------------
MySQL has cancellation implemented from the cancellation code contributed
by Larry Streepy.
A thread calls a function which is a cancellation point. It sets a flag
that says it is at a cancellation point. The scheduler sees this flag
and looks to see if the thread has been cancelled. If so, it will cancel
the thread. Otherwise, the thread will be put on a wait queue and block.
If the thread gets cancelled while it is blocked, when the thread wakes
up the scheduler will see that it is at a cancellation point and that
the thread has been cancelled, so the cancel is processed. This results
in the thread getting cancelled sooner - it happens when the thread
wakes up instead of waiting until the next cancellation point. It also
will get cancelled if the thread wakes up spuriously.
functions in MySQL:
pthread_testcancel()
return if disabled
return if already trying to cancel
if cancelled, SET_PF_RUNNING_TO_CANCEL
pthread_exit()
called by pthread_cancel, pthread_setcancelstate, pthread_setcanceltype
at cancellation points:
SET_PF_AT_CANCEL_POINT
pthread_resched_resume(xxx_WAIT)
CLEAR_PF_AT_CANCEL_POINT
pthread_resched_resume()
if not trying to cancel
and thread is cancellable
(cancel is enabled, thread has been cancelled,
cancel type is asynchronous or it's at a cancellation point).
and thread is not dead or unalloced
SET_PF_RUNNING_TO_CANCEL
pthread_sched_resume()
if not trying to cancel
and thread is cancellable
pthread_cancel_internal()
pthread_cancel_internal()
pthread_cancel_internal()
lock kernel
if thread state is not RUNNING
remove thread from the queue it's on
set state to RUNNING
SET_PF_RUNNING_TO_CANCEL
fd_unlock_for_cancel()
pthread_sched_resume() - lock the kernel and process signals
pthread_exit()
NOTES:
------
The following functions are missing their cancellation point:
fcntl(), open(), close(), fsync(), msync(), sleep(), and tcdrain()
THREAD SCHEDULING:
------------------
There are individual lists of threads for each priority. Threads in a
higher priority list are scheduled before threads in a lower priority
list. Within each list, where all threads have the same priority, the
threads are scheduled according to the policy.
SCHED_FIFO is a first-in first-out order. When a thread is preempted, it
is put back at the first of the list and continues running unless there
is a thread with a higher priority waiting. When a thread is blocked, it
is put at the end of the list. When a thread's scheduling policy or
priority is changed, it is put at the end of the new priority's list.
SCHED_RR is like SCHED_FIFO, except that each thread has a timer set so
that it won't get rescheduled indefinitely if there are other threads at
the same priority waiting. When a thread has been rescheduled and run
for PTHREAD_ITIMER_INTERVAL, the timer will expire and the thread will
go to the end of the list.
SCHED_IO is something in between the other two. It will set the timer
like RR only when there are no signals pending. Otherwise, it does not
set the timer.
The spec says that there must be a scheduling type called SCHED_OTHER.
We added this so that is the same as SCHED_IO.
|