12 - Thread Synchronization

Notes:

Atomic Operations

An atomic operation either "runs to completion" or "not at all".

It is indivisible: in the sense that during the operation done by $A$
- B cannot "observe" or "act" on the state under modification by A.
Fundamental building block-without atomic operations we have no way for threads to work together and make sense of the resulting computation

Each instruction in the x86 instruction set is atomic. An instruction fully finishes before the current process/thread can be preempted/interrupted.

Notes:

An operation that either runs to completion OR not at all.
Its intermediate state i snot visible to anybody else outside
Isolation: exectuing a thread completely isolated from other threads

Synchronization Variable - Lock to Provide Mutual Exclusion

First step towards making shared data thread-safe
The idea is to make a sequence of instructions "atomic" prevent two threads from executing the block concurrently
- The idea is to make multiple instructions atomic to stop context switching from being useful

General idea:

Notes:

We are going to place a lock around *p= p* = 1 for example in 11 - Concurrency and Threads#Race Condition Demonstration
This guarantees that only one of them acquires the lock, which will prevent everybody else from being in the critical section at the same time
When the thread that has the lock unlocks the lock, then one other thread can enter that critical section.
This is the developers responsibility: Identifying shared state and critical sections
In this case, if x had a certain value (x = 0) and then t1 and t2 both wanting to increment it, no matter which ones runs first, the critical sections will be serialized.

What is a mutex?

A mutex is a lock that we acquire before using a shared resource and releasing it after use.
When the lock is "set", no other thread can access the locked region of code [advisory].
It is used to protect data or other resources from concurrent access.
A mutex lock (pthreads) can only be released by the thread that locked it.

Notes:

Mutex = mutual exclusion
Purpose: While one entity has the mutex, another entity cannot get that mutex (cannot acquire the lock at the same time)
This is basically you making sure that the critical section is serialize, this is called Advisory.
- Your OS or compiler has no idea what your app semantics are
- Only you understand what is consistent for your application

What is a critical section?

A critical section is a segment of a program that multiple threads should not access simultaneously.
The segment needs to be executed exclusively & atomically, such as accessing a resource (file, input or output port, global data, etc.)
It contains shared variables or resources that need to be synchronized to maintain consistency of data variables.
In concurrent programming, if one thread tries to change the value of shared data at the same time as another thread tries to read the value (i.e., data race across threads), the result is unpredictable.

Notes:

The segment of our program that we need to protect
Bad things happen when we do not protect it (i.e. overwrites)
You want to protect code A using a lock and code B using the same lock so that these sections are serialized (happen in order)

Mutex thread synchronization

Thread $_{0}$	Thread $_{1}$
mutex lock	mutex lock
critical section	critical section
mutex unlock	mutex unlock

Notes:

Thread 0 executes completely isolated from thread 1
In general if the only thing you are doing is read(), then you can get away without locking.
But if you are writing, this is where critical sections happen
- It is the entire read-write section that needs to be protected
- Critical section should include the read that derived the thing you will write to

Mutex thread synchronization

...

Mutex in C++ for Thread Safety

...

...

Notes:

If they don't use the same mutex, then you are not serializing anything
This is just a big heavy duty block that basically has one thread finishing, then the second thread executing or the other way around.

Now when we do:

./a.out 1000000
data = 2000000

You will be able to see a deterministic output

...

Mutex in C++ -- Finer Locking

...

Notes:

We can only protect the read-modify-write section
It produces the same correct result
So yes, we can put locks inside loops to do fine locking

When we do:

./a.out 1000000
data = 2000000

Once again we are in happy territory

Timeline of Coarse/Fine Grained Locking

Notes:

Coarse grained: A thread that acquire a lock right at the beginning of the function, and then releases it at the end
Fine grained: you have much more locks and unlocks, since these happen between small sections of code that are critical

Producer-Consumer Synchronization

...

Notes:

Mutual exclusion is not enough!
It is good but for shared variables we may need other types of primitives to be able to do more general things
Here is an example if a Queue
- You have multiple consumers and multiple producers
- You can already see that there are shared variables between consumer and producer
  - producer needs to touch head/tail and size to add an element
  - consumer as it pulls an item from the list needs to change the size and look at head/tail.
  - What if both of them read tail at the same time and both try actions at the same time?
  - The order in which things happens, you can actually lose elements
This sounds like a problem for mutex
- Only one guy can be in the mutation at any given time

Producer-Consumer: problems

There’s a race condition in accessing the list if there are multiple producers
- Push-back at a queue end may not be thread-safe.
- It may require incrementing size, which may not be thread-safe.
If there are multiple consumers, more than one thread may pass the test even if there is only one item in the queue
- Two threads may call list.pop().

Notes:

If the list size is 0, they do a busy wait
When both think the queue is now empty, they can both try to read/write to the list, which is where bad things happen
You might say: I do not need to protect anything because these are thread safe. But actually the problem is not that this is thread safe or not, it is that thread safety has gone out of the window much earlier.
Both think that there are elements in the queue that they can pull off. The fact that you have multiple threads that both determine that there is at least one element in the queue and add an element in the queue is a violation. Now they basically both invoke list.pop, and depending on how well it is written, that can be our problem.

Solution:
You might attempt to solve multiple consumer’s race condition by trying to do
the following:

This leads to a deadlock when list is empty, because “consumer” is blocked
with mutex held.

Consumer holds the lock while the queue is empty,
Producer must wait for lock to add data to the queue.

It would be nice if the lock could somehow be transferred to another thread while one consumer thread is “stuck”.

Notes:

We are guaranteeing that only one thing sis at that critical section at any given time
Mutexes do not give us dependence order, the order in which to execute things
For this we need a new primitive
Example:
- Lets say that a consumer tried to pull an item from the que and found that the size of the queue was empty, now they will just wait for something to appear but since they are holding the mutex, that won't never happen! the producer is not able to run!
  - You are deadlock!
Mutexes are not enforcing an ordering
The only way I can actually test whether I can consume or not is by acquire the lock, but now when I determine that I can actually not do my thing, I want to be able to relinquish this lock and try again later.
One thing that the consumer can do is to throw an error here, and have the higher level program call consumer again.
- But this i still a busy wait, we are just trying again repeatedly!
The only way a consumer can progress is if a producer runs, but since you are holding the lock, nothing else can run, so you are essentially deadlock
- If you have written your mutexes correctly, if you return an exception you will be able to unlock the consumer
- But this is still a busy/pooling situation
Takeaway: Mutexes give you mutual exclusion but they do not directly provide serialization and ordering!

What if...

we determine that we cannot proceed (even though it is holding the lock)
It would be nice is that at this point it could release the lock and go off to sleep
So instead of a busy wait, you atomically release the mutex and go to sleep
The producer can actually define a signal that can tell the consumer to wake up! when waking up the consumer acquires the mutex
It still have mutual exclusion, and you can proceed!
It turns out that if the consumer wakes up and turns out there are 0 elements in the list, once again, it releases the mutex and goes to sleep

What we need is "condition variables"

A condition variable (CV) is

a queue of waiting threads, with
a lock for access to CS (multiple CVs per lock is ok)

wait(cond_t *cv, mutex_t *lock)

assumes lock is held when wait() is called
puts caller to sleep & releases the lock atomically
when awoken, reacquires lock before returning

signal(cond_t *cv)

wake a single waiting thread (if >= 1 thread is waiting)
if there is no waiting thread, just return w/o doing anything

Notes:

Think of a condition variable as of being a queue were threads can wait, and that it is associated with a lock
When you call wait(), you will release the lock and park yourself in the queue corresponding to this condition variable
The OS will pick one random process from the queue later, the first thing the chosen thread will do is to acquire the mutex.
The consumer calls wait on this condition vraiable when there is no items on the lists
- This says: here is this mutex, here is this queue -> release the mutex and put me on the queue
Now what the producer does is that it signals on the condition variable
- This says:
  - if tehre is someone waiting on the queue -> please wake them (one) up
  - if there is none waiting on the queue -> that signal is lost
What about having multiple waits?
- You get blocked on the first wait() you call, if it is the case that you wake up from that wait you can continue calling other waits one at a time.
- You can't call wait() twice in an object!
  - The first wait() you call you are already blocked!
  - There are no nested waits
This functions as some sort of agreement between the waiter and the signaler
This mechanism is so general that you can basically encode any condition that you want basically
Remember:
- When you call wait() you better hold this mutex
- Whenever somebody signals this condition variable, you either pick one or eveyone waiting on this queue, but the first thing they will do is to acquire this mutex for this critical section

Example:

void Consumer () {
    mutex.lock();
    while (list.size() == 0) {
    mutex.unlock();
    sleep until list = empty;
    mutex.lock();
    }
    int data = list.front();
    list.pop();
    mutex.unlock();
}

void Producer () {
    int data = 0;
    while (true) {
        data = produce_data();
        mutex.lock();
        list.push_back(data);
        mutex.unlock();
        nofify_consumer();
    }
}

Notes:

Once the lock is acquired, it tests again
Between releasing the lock and sleep there is a lot of space
Note if the signal is lost and the consumer goes to sleep, it may potentially go to sleep forever
It is dangerous because when you release the lock, you need to make this things atomic
- This is so that if there is a signal, then that signal is not lost

"Broadcasting" on a condition variable

image-31.png457x190

Two burgers already available in the warmer.
5 consumers show up and place their orders.
2 consumers get their burgers—the remaining 3 start playing.
When each burger is ready:
- The producer notifies all waiting consumers.
- All consumers queue up to the counter—but the first one gets the burger.
- The remaining consumers return to their tables and continue playing.

Notes:

Waking everybody = broadcasting on the condition variable
Since you have two burgers in the burner, two consumers are automatically satisfied, but the remaining 3 are parked on the condition variable queue
Every time that the cook produces 1 or more burgers, he sends a broadcast signal to the condition variable queue
- They all now compete to get the mutex, only one of them succeeds, and that client gets the burger and the rest will go back to sleep: they will acquire the mutex, they will find that the condition is false (there are no burgers) and they will release and go to sleep.
The advantage of the broadcaster is that if you happen to produce 2 burgers you just need to wakeup everybody at the same time and you will know that 2 of them will get served automatically
Be sure that your Exam will have condition variables. This is a fundamental thing and are very useful!

A detour through the C++ Thread API

#include <iostream>
#include <string>
#include <thread>
void hello_worldstring s {
    cout << s << std::endl;
}
class HelloWorld {
public:
    void hello_worldstring s {
        cout << s << " from " <<
        "HelloWorld.hello_world." << endl;
    }
    void operator string s) const { (2
        cout << s << "from " <<
        " HelloWorld.operator()." << endl;
    }
};

int main() {
    std::string s = "Hello World";
    thread t1(hello_world, s); (1)
    HelloWorld hw;
    thread t2(hw, s);
    thread t3hello_world, &hw, s;
    std::thread t4([s] {
        cout << s << "from a lambda." << endl;
    });
    t1.join();
    t2.join();
    t3.join();
    t4.join();
}

Notes:

When your thread object is created, you are already runnable!
Having a method in a class that implements the operator method
- Then another way to create a thread is by giving an instance of that class and what is going to happen is that the function that is going to be invoked is the operator?
- The way you provide the self is to provide the pointer to the argument
- For us the most important use is for lambdas (anonymous function)
- This anonymous function takes no arguments and captures the variable s from its scope
  - C++ did a slightly different design to lambdas
  - You will copy by value
    - Captures s within the scope of you lambda

Producer-Consumer: Correctly

Step 1: Declare a condition variable and a mutex

Step 2: The producer produces data in a way such that the consumer can notice e.g., pushing into the vector that consumer checks

Step 3: Calls notify_one/all() on the condition to wake up the consumers so that they can check again

Notes:

The highlighted in consumer allows you to say: now I have a scope block and if I ever exit I will release the lock? And when somebody wakes me please run this lambda, if this lambda is true, only then return me from the wait, if this lambda is not true, then our condition is not satisfied so release the lock again and go back to sleep
This is sort of encapsulating that whole while loop!
What the consumer is doing is that before you can touch the shared data structure, it needs to acquire a lock that it shares with the producer, so that they can exchange data.
C++ has exceptions, so it is dangerous to do things like m.lock(), you might actually end up five levels above while you still hold the mutex
The suggested usage is to create a scope lock.
- You lock the mutex inside a type class called unique_lock.
- No matter how you exit from this function you will release the lock

Produce-Consumer

Step 3: The consumer(s) do the following:

Calls wait() on the condition
Wait() needs a “wrapped” lock (as unique_lock) and a predicate function
- The “wrapper” also locks the lock
  - Consume data
  - Unlock the lock

An example producer-consumer scenario

00 - TAMU Brain/6th Semester (Spring 26)/CSCE-313/Lecture/Visual Aids/image-35.png459x211

Producer-consumer with bounded size?

00 - TAMU Brain/6th Semester (Spring 26)/CSCE-313/Lecture/Visual Aids/image-36.png458

Notes:

You make another condition variable that signals when...
You have application layer conditions:
- Two condition variables.
  - can_write
  - can_read
- You program your application so that it notifies these condition variables correctly
  - First see wether the producer can proceed (if size of the queue is less than the produced size)
  - Make sure that there is at least one item in the queue
The fact that you can encode any condition in a condition variable, makes them very general

Producer-Consumer in C

#define QUEUE_SZ 10
int queue[QUEUE_SZ];
int queue_hd = 0, queue_tl = 0, queue_nitems = 0;

pthread_mutex_t m;
pthread_cond_t can_write, can_read;

void*producer(void *pv) {
	static int counter = 1;
	
	while (1) {
		pthread_mutex_lock(&m);
		while(queue_nitems == QUEUE_SZ) {
			...
		}
		...
	}
}

Cond Var I

int done = 0;

void *child (void *arg) {
	printf ("child\n");
	done = 1;
	return NULL;
}

int main (int argc, char *argv[]) {
	pthread_t p;
	printf ("parent: begin\n");
	pthread_create (&p, 0, child, 0);
	while (done == 0);
	printf ("parent: end\n");
	return 0;
}

Notes:

pthread_t gives you the thread id;
0 basically says give me a standard thread
&p is a pointer to a function
It returns anything (a generic pointer), you as an application know what pointer type you are passing and what pointer type you are getting back. You know what you are going to be passed.
- You will have to cast it to that appropriate type before we can use it
You cannot provide several arguments to child, to do that you will need to create a struct and pass a pointer to that struct.

Cond Var II

void *child (void *arg) {
	printf ("child\n");
	done = 1;
	pthread_cond_signal(&cv);
	return NULL;
}

int main (int argc, char *argv[]) {
	pthread_t p;
	printf ("parent: begin\n");
	pthread_create(&p, 0, child, 0);
	while(done == 0) {
		pthread_cond_wait(&cv, &m);
	}
	printf("parent: end\n");
	return 0;
}

Notes:

The parent creates a child, and blocks itself on a condition variable queue
Problem: The child sets done to 1, signals the condition variable, but that signal is lost because the parent is not blocked yet, now the parent blocks itself in the condition variable queue and it is blocked forever

Cond Var III

void *child (void *arg) {
	printf ("child\n");
	pthread_mutex_lock(&m);
	pthread_cond_signal(&c);
	pthread_mutex_unlock(&m);
	return NULL;
}

int main (int argc, char *argv[]) {
	pthread_t p;
	printf ("parent: begin\n");
	pthread_create (&p, 0, child, 0);
	pthread_mutex_lock(&m);
	pthread_cond_wait(&c, &m);
	pthread_mutex_unlock(&m);
	printf ("parent: end\n");
	return 0;
}

Notes:

Now we acquire mutexes before going to sleep!
Unfortunately either this won't work (because any can execute first)
You have removed state, which was used to determine whether the parent should sleep or not
Now you are completely dependent of...
What if we put the lock before doing pth_create?
The point: it is very tricky to reason about concurrency

Cond Var IV (works)

void *child (void *arg) {
    printf ("child\n");
    othread_mutex_lock(&m);
    done = 1;
    pthread_cond_signal(&c);
    pthread_mutex_unlock(&m);
    return NULL;
}
int main (int argc, char *argv[]) {
    pthread_t p;
    printf ("parent : begin\n");
    pthread_create (&p, 0, child, 0);
    othread_mutex_lock(&m);
    while (done == 0)
        pthread_cond_wait(&c, &m);
    pthread_mutex_unlock(&m);
    printf ("parent: end\n");
    return 0;
}

Notes:

Now it is testing for a state!
Now guarantees that once it blocks in the condition variable, the child has not got executed

What if the two operations are not atomic?

release(mutex);
add yourself to the Q;

Lost update problem

...