Files
TDT4125/assignment-3/task3.typ
2026-03-26 18:15:39 +01:00

20 lines
1.4 KiB
Typst

#import "@preview/catppuccin:1.1.0": catppuccin, flavors
#show: catppuccin.with(flavors.mocha)
#set text(size: 16pt)
== task 3
the following article explains an algorithm that solves our problem:
#link("https://en.wikipedia.org/wiki/Reservoir_sampling#Simple:_Algorithm_R")
the main idea is to keep a reservoir of $k$ items, deciding randomly if the $i$-th item you revealed should be added to the reservoir (replacing any item already in there at that slot), otherwise discarding the item.
the random variable determining the inclusion of the item in the reservoir is given as a random integer between 1 and $i$; if it is less than or equal to $k$, it will fit in the reservoir and then occupies that drawn slot in the reservoir. if it is greater than $k$, it will be discarded.
this ensures a sampling with a uniform probability distribution of $n$
proof: since an item $i$ is included in the reservoir with a probability of $k/i$, the next item $i + 1$ has probability $k/(i + 1)$ of being included in the reservoir. but since processing the $(i + 1)$-th input affects the probability of the previous elements, the probability of $i$ is now $k/i times (1 - 1/(i + 1)) = k/(i + 1)$. but then it isn't actually the case that the probability is different for the previous elements, i.e. the sampling was uniform and independent.
as the wiki mentions, the algorithm is needlessly slow, but plenty apt for a simple solution to our problem.