diff --git a/exercise5/report/Makefile b/exercise5/report/Makefile new file mode 100644 index 0000000..485e6da --- /dev/null +++ b/exercise5/report/Makefile @@ -0,0 +1,5 @@ +all: report.pdf show +report.pdf: report.md + pandoc $< -o $@ --pdf-engine=typst +show: report.pdf + zathura report.pdf diff --git a/exercise5/report/report.md b/exercise5/report/report.md new file mode 100644 index 0000000..a6fbd0f --- /dev/null +++ b/exercise5/report/report.md @@ -0,0 +1,70 @@ +--- +title: theory questions +author: fredrik robertsen +date: 2025-10-20 +--- + +## 1. why no border exchange in pthreads? + +threads operate on shared memory as opposed to the quarantined processes of MPI +and the likes. thus there is no need to communicate the border values, since the +threads can simply read them. as such, we need to be careful with placing some +barriers, such that the threads access the memory at the correct times, i.e. +when data has been calculated and is ready to be read for further processing. + +we want to avoid serializing the program too much, so excessive barriering is +bad for performance. + +## 2. OpenMP vs MPI + +they may sound similar, but they are fundamentally different, in that openmp +uses threads, while mpi uses processes, as i mentioned in 1. they both have +their strengths and weaknesses: openmp is rather good for abstract, higher-level +parallelism using threads, essentially acting as a "go faster" button for +computationally intensive code, abstracting away the ceremony of pthreads; mpi +uses processes to compute in parallel, relying on message passing between them. + +if you wish to compute on a large scale, for example using a cluster with many +cores available, mpi might be more performant and more logical to use, whereas +for a single computer, threading is a good way to gain some speed-up. just be +careful about your data locality in either case! + +## 3. pthreads vs OMP barrier vs OMP workshare + +the provided openmp barrier implementation is almost identical to my pthreads +solution. it manually tracks the thread ids and shares the workload dynamically +based on how many threads you spawn, using meticulous barriers for correctness +in a near-identical way to my pthreads. instead of doing a modulo/fixed-step +division of work, i give each thread a row of data in pthreads for better cache +locality, since the data will be close in memory on each thread, rather than +having to jump a row for each thread. in short, i win because of row-major cache +locality optimization. + +my openmp workshare implementation parallelizes the for loop using the magical +`#pragma omp parallel for`, known as the "free speed-up" button. it essentially +splits the workload between each thread, giving each thread a row of the for-loop. +this is a higher-level implementation similar to both the other two, but without +any of the boilerplate. sweet! + +## 4. parallelizing recursion problems with OpenMP + +recursion can be seen as a tree-structure, where nested function calls create +nodes of stack contexts that remember the state of their parent nodes. if you +handle the race conditions properly using locks or atomic semantics, you can +spawn threads for each recursive call, creating a situation where subthreads +create more threads. similar to infinite recursion problems and OS fork-bombs, +this must be handled carefully. + +openmp can streamline this for us using task-oriented semantics. you can create +a task which is then queued to be computed by a thread in the thread pool, which +typically contains as many threads as you have cores available, though this is +compiler-specific or user-specified. this can be done using + +```c +// pseudocode using regex patterns +#pragma omp task [depend(in:$(var)) | firstprivate(x)] +#pragma omp task[wait|group] +``` + +by using a thread pool, we avoid the dangers of spawning too many threads, +avoiding a fork-bomb.