3.3 KiB
title, author, date
| title | author | date |
|---|---|---|
| theory questions | fredrik robertsen | 2025-10-20 |
1. why no border exchange in pthreads?
threads operate on shared memory as opposed to the quarantined processes of MPI and the likes. thus there is no need to communicate the border values, since the threads can simply read them. as such, we need to be careful with placing some barriers, such that the threads access the memory at the correct times, i.e. when data has been calculated and is ready to be read for further processing.
we want to avoid serializing the program too much, so excessive barriering is bad for performance.
2. OpenMP vs MPI
they may sound similar, but they are fundamentally different, in that openmp uses threads, while mpi uses processes, as i mentioned in 1. they both have their strengths and weaknesses: openmp is rather good for abstract, higher-level parallelism using threads, essentially acting as a "go faster" button for computationally intensive code, abstracting away the ceremony of pthreads; mpi uses processes to compute in parallel, relying on message passing between them.
if you wish to compute on a large scale, for example using a cluster with many cores available, mpi might be more performant and more logical to use, whereas for a single computer, threading is a good way to gain some speed-up. just be careful about your data locality in either case!
3. pthreads vs OMP barrier vs OMP workshare
the provided openmp barrier implementation is almost identical to my pthreads solution. it manually tracks the thread ids and shares the workload dynamically based on how many threads you spawn, using meticulous barriers for correctness in a near-identical way to my pthreads. instead of doing a modulo/fixed-step division of work, i give each thread a row of data in pthreads for better cache locality, since the data will be close in memory on each thread, rather than having to jump a row for each thread. in short, i win because of row-major cache locality optimization.
my openmp workshare implementation parallelizes the for loop using the magical
#pragma omp parallel for, known as the "free speed-up" button. it essentially
splits the workload between each thread, giving each thread a row of the for-loop.
this is a higher-level implementation similar to both the other two, but without
any of the boilerplate. sweet!
4. parallelizing recursion problems with OpenMP
recursion can be seen as a tree-structure, where nested function calls create nodes of stack contexts that remember the state of their parent nodes. if you handle the race conditions properly using locks or atomic semantics, you can spawn threads for each recursive call, creating a situation where subthreads create more threads. similar to infinite recursion problems and OS fork-bombs, this must be handled carefully.
openmp can streamline this for us using task-oriented semantics. you can create a task which is then queued to be computed by a thread in the thread pool, which typically contains as many threads as you have cores available, though this is compiler-specific or user-specified. this can be done using
// pseudocode using regex patterns
#pragma omp task [depend(in:$(var)) | firstprivate(x)]
#pragma omp task[wait|group]
by using a thread pool, we avoid the dangers of spawning too many threads, avoiding a fork-bomb.