ex5: finish report

2025-10-20 21:14:36 +02:00
parent e89f42a08b
commit dd3db2bbe0
2 changed files with 75 additions and 0 deletions
--- a/exercise5/report/Makefile
+++ b/exercise5/report/Makefile
@@ -0,0 +1,5 @@
+all: report.pdf show
+report.pdf: report.md
+	pandoc $< -o $@ --pdf-engine=typst
+show: report.pdf
+	zathura report.pdf
--- a/exercise5/report/report.md
+++ b/exercise5/report/report.md
@@ -0,0 +1,70 @@
+---
+title: theory questions
+author: fredrik robertsen
+date: 2025-10-20
+---
+
+## 1. why no border exchange in pthreads?
+
+threads operate on shared memory as opposed to the quarantined processes of MPI
+and the likes. thus there is no need to communicate the border values, since the
+threads can simply read them. as such, we need to be careful with placing some
+barriers, such that the threads access the memory at the correct times, i.e.
+when data has been calculated and is ready to be read for further processing.
+
+we want to avoid serializing the program too much, so excessive barriering is
+bad for performance.
+
+## 2. OpenMP vs MPI
+
+they may sound similar, but they are fundamentally different, in that openmp
+uses threads, while mpi uses processes, as i mentioned in 1. they both have
+their strengths and weaknesses: openmp is rather good for abstract, higher-level
+parallelism using threads, essentially acting as a "go faster" button for
+computationally intensive code, abstracting away the ceremony of pthreads; mpi
+uses processes to compute in parallel, relying on message passing between them.
+
+if you wish to compute on a large scale, for example using a cluster with many
+cores available, mpi might be more performant and more logical to use, whereas
+for a single computer, threading is a good way to gain some speed-up. just be
+careful about your data locality in either case!
+
+## 3. pthreads vs OMP barrier vs OMP workshare
+
+the provided openmp barrier implementation is almost identical to my pthreads
+solution. it manually tracks the thread ids and shares the workload dynamically
+based on how many threads you spawn, using meticulous barriers for correctness
+in a near-identical way to my pthreads. instead of doing a modulo/fixed-step
+division of work, i give each thread a row of data in pthreads for better cache
+locality, since the data will be close in memory on each thread, rather than
+having to jump a row for each thread. in short, i win because of row-major cache
+locality optimization.
+
+my openmp workshare implementation parallelizes the for loop using the magical
+`#pragma omp parallel for`, known as the "free speed-up" button. it essentially
+splits the workload between each thread, giving each thread a row of the for-loop.
+this is a higher-level implementation similar to both the other two, but without
+any of the boilerplate. sweet!
+
+## 4. parallelizing recursion problems with OpenMP
+
+recursion can be seen as a tree-structure, where nested function calls create
+nodes of stack contexts that remember the state of their parent nodes. if you
+handle the race conditions properly using locks or atomic semantics, you can
+spawn threads for each recursive call, creating a situation where subthreads
+create more threads. similar to infinite recursion problems and OS fork-bombs,
+this must be handled carefully.
+
+openmp can streamline this for us using task-oriented semantics. you can create
+a task which is then queued to be computed by a thread in the thread pool, which
+typically contains as many threads as you have cores available, though this is
+compiler-specific or user-specified. this can be done using
+
+```c
+// pseudocode using regex patterns
+#pragma omp task [depend(in:$(var)) | firstprivate(x)]
+#pragma omp task[wait|group]
+```
+
+by using a thread pool, we avoid the dangers of spawning too many threads,
+avoiding a fork-bomb.