ex5: finish report

This commit is contained in:
2025-10-20 21:14:36 +02:00
parent e89f42a08b
commit dd3db2bbe0
2 changed files with 75 additions and 0 deletions

View File

@@ -0,0 +1,5 @@
all: report.pdf show
report.pdf: report.md
pandoc $< -o $@ --pdf-engine=typst
show: report.pdf
zathura report.pdf

View File

@@ -0,0 +1,70 @@
---
title: theory questions
author: fredrik robertsen
date: 2025-10-20
---
## 1. why no border exchange in pthreads?
threads operate on shared memory as opposed to the quarantined processes of MPI
and the likes. thus there is no need to communicate the border values, since the
threads can simply read them. as such, we need to be careful with placing some
barriers, such that the threads access the memory at the correct times, i.e.
when data has been calculated and is ready to be read for further processing.
we want to avoid serializing the program too much, so excessive barriering is
bad for performance.
## 2. OpenMP vs MPI
they may sound similar, but they are fundamentally different, in that openmp
uses threads, while mpi uses processes, as i mentioned in 1. they both have
their strengths and weaknesses: openmp is rather good for abstract, higher-level
parallelism using threads, essentially acting as a "go faster" button for
computationally intensive code, abstracting away the ceremony of pthreads; mpi
uses processes to compute in parallel, relying on message passing between them.
if you wish to compute on a large scale, for example using a cluster with many
cores available, mpi might be more performant and more logical to use, whereas
for a single computer, threading is a good way to gain some speed-up. just be
careful about your data locality in either case!
## 3. pthreads vs OMP barrier vs OMP workshare
the provided openmp barrier implementation is almost identical to my pthreads
solution. it manually tracks the thread ids and shares the workload dynamically
based on how many threads you spawn, using meticulous barriers for correctness
in a near-identical way to my pthreads. instead of doing a modulo/fixed-step
division of work, i give each thread a row of data in pthreads for better cache
locality, since the data will be close in memory on each thread, rather than
having to jump a row for each thread. in short, i win because of row-major cache
locality optimization.
my openmp workshare implementation parallelizes the for loop using the magical
`#pragma omp parallel for`, known as the "free speed-up" button. it essentially
splits the workload between each thread, giving each thread a row of the for-loop.
this is a higher-level implementation similar to both the other two, but without
any of the boilerplate. sweet!
## 4. parallelizing recursion problems with OpenMP
recursion can be seen as a tree-structure, where nested function calls create
nodes of stack contexts that remember the state of their parent nodes. if you
handle the race conditions properly using locks or atomic semantics, you can
spawn threads for each recursive call, creating a situation where subthreads
create more threads. similar to infinite recursion problems and OS fork-bombs,
this must be handled carefully.
openmp can streamline this for us using task-oriented semantics. you can create
a task which is then queued to be computed by a thread in the thread pool, which
typically contains as many threads as you have cores available, though this is
compiler-specific or user-specified. this can be done using
```c
// pseudocode using regex patterns
#pragma omp task [depend(in:$(var)) | firstprivate(x)]
#pragma omp task[wait|group]
```
by using a thread pool, we avoid the dangers of spawning too many threads,
avoiding a fork-bomb.