11 - Concurrency and Threads
Class: CSCE-313
Notes:
Outline:
- What is concurrency?
- Macro illusion that multiple things are happening at the same time
- What is a thread?
- A useful abstraction for a program to program in a concurrent world
- Thread context switch
- Thread API
What is concurrency?
...
Notes:
- Here we have two processes
- Concurrency is basically the macro view of P1 and P2 executing seemingly at the same time
- In a multi-core processor you achieve parallelism (real concurrency) but for a single core it looks like everything is running concurrently
What is a thread?
...
Notes:
- You are giving a programatic framework for a program to exploit your hardware parallelism or the fact that your OS is doing context switching on your behalf
- You can hide latency of execution with useful work
- If you didn't have this abstraction, then your latency could increase even though you may actually not be doing more work
Threads as a programmatic framework
...
- In a single threaded world, Mario is blocked when reaching the river
- While he is blocked, he can't do any useful work, he has to wait for Toad to get him on the boat and bring him to the other side of the river.
- In a multi-threaded world, supposing you have a way to express that multiple things happen at the same time, Mario has a virtual Mario, and when one Mario is blocked, it can activate the other Mario and in that way proceed to the castle to get the magic boat
- The same happens when coming back
- This Mario is essentially hiding its latency by doing useful work!
- This is how you organize your computation around threads.
Multi-threaded process
...
Notes:
- Note your user space thread is invisible to the kernel.
- They share everything else the same and in some sense are logically contained in this P process.
- If the kernel knew we have multiple threads on a process, then it knows that maybe it should schedule another thread from that same process.
- The amount of work that may be needed may be the same. But it is latency that really matters here.
Process vs. Thread
Notes:
- Each thread in a process has its own stack
- Each thread is invoking functions in its own data (data dependent)
- They also keep they own CPU context (registers)
sigmaskis per thread also
- But they share other things (heap, data, text, etc)
- Because a process is a container, they all share the same signal disposition (the same signal handlers and so on.)
Motivation for Threads
Basically two reasons:
...
Notes:
- Now your programmer can use threads to make better use of your machine
- Note utilization may not be a problem but by exploiting parallelism you reduce your jobs latency, you can do more things at the same time so you can finish quicker
- If you have 10 threads they all block for a 100ms but that blocking is overlapped so they can do useful work at all times.
- If you compare processes to threads:
- Before threads came around, basically everybody else used processes to create concurrency, which is very expensive (creating processes is very expensive)
- Copy entire page table and so on.
- When one process is running it is basically saving stuff in the cache, every time you context switch a process you may need to flush the cache, which needs to be moved to memory -> more latency
- Communication between process may not be terribly convenient
- Before threads came around, basically everybody else used processes to create concurrency, which is very expensive (creating processes is very expensive)
Process Context Switch
Notes:
- Each process is an independent container of a virtual address space
- Creating new processes and context switching is expensive.
Thread Context Switch
Notes:
- Threads share the same address space, you do not have to switch the address space on context switching
- Everything that one thread touches and caches is okay for the other thread
- When you wait for a I/O you better context switch because then you would be waisting useful time you could be working on something else
Thread-based applications
...
Notes:
- One manager and workers receiving work
- Pipelien:
- Each thread does some part of the pipeline
- Somebody fixing engine, somebody doing something else, etc.
Threaded web server
...
Notes:
- All the manager does is that it frames the request and assigns a worker to work on it
Thread Execution
...
Notes:
- When you create a thread, you basically give it a function to execute
- If you compare that to a regular fucntion call (a sinchronous call)
- It waits for this function to finish and return a result
- With our threading API, you invoke a function, a new thread is created that is going to invoke that function, and you return immediately.
Problem with shared addresses-I
...
Notes:
- The used of threads has been debated hardly for decades in the computer science community
- One single threaded asynchronous loop vs multi-thread
- If you have multiple threads they all share the same address space, so they can be mutating (reawding, writing) to addresses
- The order in which they execute will determine the final computation
- It is possible that thread A and thread B are interleved in absolutely arbitrary ways.
- You must assume that any interleaving is possible, only then can you be sure that your program will work under any condition
- Particular case:
- A runs, sets x to 1, and reads y
- At this point y = 12 and x is executed
- A runs, sets x to 1, and reads y
- Now as a programmer you are really jinxed, this becomes indeterministic
The race condition problem
...
Notes:
- If you have multiple threads touching the same variable, and one of them is writing the to this variable, then you have a disaster.
- You can actually do the equivalent of
wait()for processes withjoin()or something, but at that point you are negating the benefits of multi-threading, you are basically blocking yourself until the other guy finishes.
Rate condition in Assembly
...
Notes:
- When you look at the actual execution in assembly, you could context switch anywhere
- Each one of these instructions is basically compiled to multiple statements, and now you have even less control of what is happening.
- Note that when a signal happens either an operation didn't start or finished completely, this is the definition of atomicity. There is a unit of consistency
- As a programmer that notion of consistency has to come from you, it is implemented by you by doing techniques like blocking.
Race condition may happen because of instruction reordering
...
Notes:
- When you compile the first program you have:
- Data assigned to something that takes a long time to finish
- What the compiler does is that it sets done to true before it assigns data
- The assumption is violated, you cannot guarantee anything.
- But this is a legal optimization in a compiler, if you do not do reordering then you cannot optimize a lot.
Out-of-Order Execution causing Race Condition
- The problem arises when there is an inter-thread dependency
- Because of thread 2's wait, the 2 lines in Thread1() are no longer independent i.e., ordering is important
- However, the compiler has no way to tell that.
/CSCE-313/Lecture/Visual%20Aids/image-26.png)
Notes:
- We have a way to serialize execution
- First this red block, then the next red block
- Either this comes before this or viceversa
- But it is never the case that both are at this critical section at the same time
- The effect of any change at this critical sections cannot be absorbed by the other guy
- When one exits the critical section any change will be magically visible by the other guy.
- This is the bedrock of how we do concurrency control and the ability to do this is on the program, it is the programmer task to protect pieces of code that may interfere with each other.
- Our strategy to introducing that is to use logs or mutaxes
pthread (POSIX thread) creation
With pthreads, when a program runs, it also starts out as a single process with a single thread of control.
The pthread_create() function starts a new thread in the calling process.
#include <pthread.h>
int pthread_create(pthread_t *tid, const pthread_attr_t *attr,
void *(*func) (void *), void *arg);
tid: uniquely identifies a thread within a process and is returned by the functionattr: sets attributes such as priority & initial stack size. Can be NULL for defaultsfunc: function to call to start the thread. accepts a (void *) argument, returns (void *)arg: the argument to func
pthread_create returns 0 if successful, a positive error code if not.
C++ Thread API
#include <iostream>
#include <thread>
#include <unistd.h>
using namespace std;
void foo() {
sleep(3);
cout << "foo done" << endl;
}
void bar(int x) {
sleep(1);
cout << "bar done" << endl;
}
int main() {
thread foothrd(foo); // calls foo() in a thread
thread barthrd(bar, 77); // calls bar(x) in a thread
cout << "main, foo and bar would now execute concurrently..." << endl;
foothrd.join(); // pauses until foo finishes
barthrd.join(); // pauses until bar finishes
cout << "foo and bar completed." << endl;
return 0;
}
- When the thread returns from
bar, the thread basically finishes, and waits - Threads competing from CPU
- One happens to run first
Output:
maon, foo, bar, fib execute concurrently ...
bar: received 77
foo: done
foo, bar, fib completed
- None is guaranteed to run first
Race Condition Demonstration
- Start two (or more) threads that increment a shared variable.
- Use a pointer to the data so that the threads share it.
- Use a large number of iteractions to ensure adequate overlap.
- Notice that the output is different every time.
#include <iostream>
#include <thread>
#include <unistd.h>
using namespace std;
void func(int *p, int x) {
// increment *p x times
for (int i = 0; i < x; i++)
*p = *p + 1;
}
int main(int ac, char **av) {
int data = 0;
int times = atoi(av[1]);
// start 2 threads to increment
thread t1(func, &data, times);
thread t2(func, &data, times);
t1.join();
t2.join();
cout << "data=" << data << endl;
return 0;
}
- We want to see two threads
- One incrementing something 100 times, and the other also incrementing it 100 times, will it increment 200 times?
root@ubuntu-csce313:~\# g++ race-condition.cc root@ubuntu-csce313:~\# ./a.out 100000 data=200000
root@ubuntu-csce313:~\# ./a.out 1000000 data=2000000
root@ubuntu-csce313:~\# ./a.out 10000000 data=10938339
root@ubuntu-csce313:~\# ./a.out 10000000 data=14035068
root@ubuntu-csce313:~\# ./a.out 10000000 data=17764839
root@ubuntu-csce313:~\# ./a.out 10000000 data=16256394
root@ubuntu-csce313:~\# ./a.out 10000000 data=14049964
- The reason for this is that writes by one are overwriting the writes for the other
- Imagine a situation where you have 2 threads that are executing, they both read the same variable, they both individually increment it, and they both them write back
- Both are reading, computing, and writing
- We want one of them to happen after the other
- By failing to serialize in this example you will have writes for one being overwritten by the other
- What does
join()do?- It blocks until a thread finishes.
- If we didn't have these, the thread will just terminate and call
exti()on our process.