11 - Concurrency and Threads

#concurrency #threads

Class: CSCE-313

Notes:

Outline:

What is concurrency?
- Macro illusion that multiple things are happening at the same time
What is a thread?
- A useful abstraction for a program to program in a concurrent world
Thread context switch
Thread API

What is concurrency?

...

Notes:

Here we have two processes
Concurrency is basically the macro view of P1 and P2 executing seemingly at the same time
In a multi-core processor you achieve parallelism (real concurrency) but for a single core it looks like everything is running concurrently

What is a thread?

...

Notes:

You are giving a programatic framework for a program to exploit your hardware parallelism or the fact that your OS is doing context switching on your behalf
You can hide latency of execution with useful work
If you didn't have this abstraction, then your latency could increase even though you may actually not be doing more work

Threads as a programmatic framework

...

In a single threaded world, Mario is blocked when reaching the river
While he is blocked, he can't do any useful work, he has to wait for Toad to get him on the boat and bring him to the other side of the river.
In a multi-threaded world, supposing you have a way to express that multiple things happen at the same time, Mario has a virtual Mario, and when one Mario is blocked, it can activate the other Mario and in that way proceed to the castle to get the magic boat
The same happens when coming back
This Mario is essentially hiding its latency by doing useful work!
This is how you organize your computation around threads.

Multi-threaded process

...

Notes:

Note your user space thread is invisible to the kernel.
They share everything else the same and in some sense are logically contained in this P process.
If the kernel knew we have multiple threads on a process, then it knows that maybe it should schedule another thread from that same process.
The amount of work that may be needed may be the same. But it is latency that really matters here.

Process vs. Thread

Notes:

Each thread in a process has its own stack
- Each thread is invoking functions in its own data (data dependent)
- They also keep they own CPU context (registers)
- sigmask is per thread also
But they share other things (heap, data, text, etc)
- Because a process is a container, they all share the same signal disposition (the same signal handlers and so on.)

Motivation for Threads

Basically two reasons:
...

Notes:

Now your programmer can use threads to make better use of your machine
Note utilization may not be a problem but by exploiting parallelism you reduce your jobs latency, you can do more things at the same time so you can finish quicker
If you have 10 threads they all block for a 100ms but that blocking is overlapped so they can do useful work at all times.
If you compare processes to threads:
- Before threads came around, basically everybody else used processes to create concurrency, which is very expensive (creating processes is very expensive)
  - Copy entire page table and so on.
  - When one process is running it is basically saving stuff in the cache, every time you context switch a process you may need to flush the cache, which needs to be moved to memory -> more latency
  - Communication between process may not be terribly convenient

Process Context Switch

Notes:

Each process is an independent container of a virtual address space
Creating new processes and context switching is expensive.

Thread Context Switch

Notes:

Threads share the same address space, you do not have to switch the address space on context switching
Everything that one thread touches and caches is okay for the other thread
When you wait for a I/O you better context switch because then you would be waisting useful time you could be working on something else

Thread-based applications

...

Notes:

One manager and workers receiving work
Pipelien:
- Each thread does some part of the pipeline
- Somebody fixing engine, somebody doing something else, etc.

Threaded web server

...

Notes:

All the manager does is that it frames the request and assigns a worker to work on it

Thread Execution

...

Notes:

When you create a thread, you basically give it a function to execute
If you compare that to a regular fucntion call (a sinchronous call)
- It waits for this function to finish and return a result
With our threading API, you invoke a function, a new thread is created that is going to invoke that function, and you return immediately.

Problem with shared addresses-I

...

Notes:

The used of threads has been debated hardly for decades in the computer science community
- One single threaded asynchronous loop vs multi-thread
If you have multiple threads they all share the same address space, so they can be mutating (reawding, writing) to addresses
The order in which they execute will determine the final computation
It is possible that thread A and thread B are interleved in absolutely arbitrary ways.
You must assume that any interleaving is possible, only then can you be sure that your program will work under any condition
Particular case:
- A runs, sets x to 1, and reads y
  - At this point y = 12 and x is executed
Now as a programmer you are really jinxed, this becomes indeterministic

The race condition problem

...

Notes:

If you have multiple threads touching the same variable, and one of them is writing the to this variable, then you have a disaster.
You can actually do the equivalent of wait() for processes with join() or something, but at that point you are negating the benefits of multi-threading, you are basically blocking yourself until the other guy finishes.

Rate condition in Assembly

...

Notes:

When you look at the actual execution in assembly, you could context switch anywhere
Each one of these instructions is basically compiled to multiple statements, and now you have even less control of what is happening.
Note that when a signal happens either an operation didn't start or finished completely, this is the definition of atomicity. There is a unit of consistency
As a programmer that notion of consistency has to come from you, it is implemented by you by doing techniques like blocking.

Race condition may happen because of instruction reordering

...

Notes:

When you compile the first program you have:
- Data assigned to something that takes a long time to finish
- What the compiler does is that it sets done to true before it assigns data
- The assumption is violated, you cannot guarantee anything.
- But this is a legal optimization in a compiler, if you do not do reordering then you cannot optimize a lot.

Out-of-Order Execution causing Race Condition

The problem arises when there is an inter-thread dependency
Because of thread 2's wait, the 2 lines in Thread1() are no longer independent i.e., ordering is important
However, the compiler has no way to tell that.

Notes:

We have a way to serialize execution
- First this red block, then the next red block
- Either this comes before this or viceversa
- But it is never the case that both are at this critical section at the same time
The effect of any change at this critical sections cannot be absorbed by the other guy
When one exits the critical section any change will be magically visible by the other guy.
This is the bedrock of how we do concurrency control and the ability to do this is on the program, it is the programmer task to protect pieces of code that may interfere with each other.
Our strategy to introducing that is to use logs or mutaxes

`pthread` (POSIX thread) creation

With pthreads, when a program runs, it also starts out as a single process with a single thread of control.

The pthread_create() function starts a new thread in the calling process.

#include <pthread.h>
int pthread_create(pthread_t *tid, const pthread_attr_t *attr,
    void *(*func) (void *), void *arg);

tid: uniquely identifies a thread within a process and is returned by the function
attr: sets attributes such as priority & initial stack size. Can be NULL for defaults
func: function to call to start the thread. accepts a (void *) argument, returns (void *)
arg: the argument to func

pthread_create returns 0 if successful, a positive error code if not.

C++ Thread API

#include <iostream>
#include <thread>
#include <unistd.h>
using namespace std;

void foo() {
    sleep(3);
    cout << "foo done" << endl;
}
void bar(int x) {
    sleep(1);
    cout << "bar done" << endl;
}

int main() {
    thread foothrd(foo); // calls foo() in a thread
    thread barthrd(bar, 77); // calls bar(x) in a thread
    cout << "main, foo and bar would now execute concurrently..." << endl;
    
    foothrd.join(); // pauses until foo finishes
    barthrd.join(); // pauses until bar finishes
    cout << "foo and bar completed." << endl;
    return 0;
}

When the thread returns from bar, the thread basically finishes, and waits
Threads competing from CPU
One happens to run first

Output:

maon, foo, bar, fib execute concurrently ...
bar: received 77
foo: done
foo, bar, fib completed

None is guaranteed to run first

Race Condition Demonstration

Start two (or more) threads that increment a shared variable.
Use a pointer to the data so that the threads share it.
Use a large number of iteractions to ensure adequate overlap.
Notice that the output is different every time.

#include <iostream>
#include <thread>
#include <unistd.h>
using namespace std;

void func(int *p, int x) {
    // increment *p x times
    for (int i = 0; i < x; i++)
    *p = *p + 1;
}

int main(int ac, char **av) {
    int data = 0;
    int times = atoi(av[1]);
    // start 2 threads to increment
    thread t1(func, &data, times);
    thread t2(func, &data, times);
    t1.join();
    t2.join();
    cout << "data=" << data << endl;
    return 0;
}

We want to see two threads
One incrementing something 100 times, and the other also incrementing it 100 times, will it increment 200 times?

root@ubuntu-csce313:~\# g++ race-condition.cc root@ubuntu-csce313:~\# ./a.out 100000 data=200000
root@ubuntu-csce313:~\# ./a.out 1000000 data=2000000
root@ubuntu-csce313:~\# ./a.out 10000000 data=10938339
root@ubuntu-csce313:~\# ./a.out 10000000 data=14035068
root@ubuntu-csce313:~\# ./a.out 10000000 data=17764839
root@ubuntu-csce313:~\# ./a.out 10000000 data=16256394
root@ubuntu-csce313:~\# ./a.out 10000000 data=14049964

The reason for this is that writes by one are overwriting the writes for the other
Imagine a situation where you have 2 threads that are executing, they both read the same variable, they both individually increment it, and they both them write back
- Both are reading, computing, and writing
- We want one of them to happen after the other
By failing to serialize in this example you will have writes for one being overwritten by the other
What does join() do?
- It blocks until a thread finishes.
- If we didn't have these, the thread will just terminate and call exti() on our process.

What is concurrency?

What is a thread?

Threads as a programmatic framework

Multi-threaded process

Process vs. Thread

Motivation for Threads

Process Context Switch

Thread Context Switch

Thread-based applications

Threaded web server

Thread Execution

Problem with shared addresses-I

The race condition problem

Rate condition in Assembly

Race condition may happen because of instruction reordering

Out-of-Order Execution causing Race Condition

pthread (POSIX thread) creation

C++ Thread API

Race Condition Demonstration

`pthread` (POSIX thread) creation