compress
This commit is contained in:
1952
report/report.pdf
1952
report/report.pdf
File diff suppressed because it is too large
Load Diff
@@ -15,33 +15,29 @@
|
||||
show-toc: true,
|
||||
)
|
||||
|
||||
Source code available (code & report) at: \
|
||||
Source code available at: \
|
||||
https://git.pvv.ntnu.no/frero-uni/IT3708
|
||||
|
||||
= Introduction
|
||||
|
||||
The problem in question is how we can solve the binary knapsack problem fast using approximation. Recall that this problem and its derivations like feature selection are NP-hard. This means we cannot obtain a solution to the problem in polynomial time (source). We can use a genetic algorithm to approximate a solution to the problem, tweaking hyperparameters to influence the results (source). This report details the process of solving the knapsack problem and in particular feature selection.
|
||||
The binary knapsack problem and its derivations like feature selection are NP-hard, meaning no polynomial-time solution exists. Genetic algorithms can approximate solutions by tweaking hyperparameters. This report details solving the knapsack problem and feature selection.
|
||||
|
||||
= Background & Setup
|
||||
|
||||
The binary knapsack problem attempts to find the smallest combination of items with a given weight and value to maximize the total value without exceeding the knapsack's total weight capacity. If we remove the capacity limit, we have a feature selection problem. This is relevant to regression and machine learning, where you feed input points along with an observed target value and attempt to find a mathematical model that fits it (source).
|
||||
The binary knapsack problem finds the optimal combination of items to maximize value without exceeding weight capacity. Without the capacity limit, this becomes feature selection, relevant to regression and machine learning.
|
||||
|
||||
Genetic algorithms approximate solutions to problems that attempt to find global optima in a search space, like these (source). Thus, we can encode the items or features as bits in a bitstring, representing the individual/chromosome of the population. Starting with some number of individuals, we can repeat these steps until satisfied:
|
||||
Genetic algorithms approximate solutions by finding global optima in a search space. Items or features are encoded as bits in a bitstring representing individuals/chromosomes. Starting with a population, we repeat until satisfied:
|
||||
1. select parents from population;
|
||||
2. perform crossover with pairs of parents; then
|
||||
2. perform crossover with pairs of parents;
|
||||
3. mutate offspring.
|
||||
|
||||
I implemented a modular genetic algorithm where I can easily switch out the different operators and hyperparameters, i.e. how we choose parents and survivors or how we perform crossovers and mutations. This allows for easy testing and tuning. Written in over a thousand lines of low-level #text(blue)[#link("https://odin-lang.org/")[Odin]] code, it is fairly performant, especially thanks to memoizing the calculated fitness values. I use #text(blue)[#link("https://uiua.org/")[Uiua]] for plotting, because it was easy.
|
||||
|
||||
The code repository has detailed instructions on how to use the code to achieve the following results.
|
||||
I implemented a modular genetic algorithm in over a thousand lines of #text(blue)[#link("https://odin-lang.org/")[Odin]] code with memoized fitness values for performance. #text(blue)[#link("https://uiua.org/")[Uiua]] handles plotting.
|
||||
|
||||
= Results & Reflection
|
||||
|
||||
These are the findings.
|
||||
|
||||
== Running the algorithm
|
||||
|
||||
After successfully stitching together an implementation of a genetic algorithm, this is what the program would spit out:
|
||||
Initial output with random parent selection, 70% single-point crossover, 1% bit-flip mutations and full generational replacement:
|
||||
|
||||
```
|
||||
Baseline RMSE: 0.1952
|
||||
@@ -50,56 +46,46 @@ Gen 0: Best=0.1885 Mean=0.1957 Worst=0.2062 Entropy=49.5620
|
||||
Gen 99: Best=0.1920 Mean=0.1974 Worst=0.2088 Entropy=40.3545
|
||||
```
|
||||
|
||||
Note this particular configuration ended up worse than it started off! That's because I ran it with random parent selection, 70% single-point crossover, 1% bit-flip mutations and full generational replacement. The main catch here is that selecting parents at random is not inducive to approaching any optima @pavlic2026ga_hyperparameters. It is however a very simple operator to implement.
|
||||
Results worsened because random parent selection doesn't approach optima @pavlic2026ga_hyperparameters.
|
||||
|
||||
== Best and Worst RMSE
|
||||
|
||||
On to the meat of this.
|
||||
|
||||
From our first run above, we can see that the baseline RMSE obtained by calculating the fitness of all alleles activated, i.e. by selecting all features, is around 0.195. We wish to minimize, thus seek lower scores than this. Running on the same seed, meaning the same baseline, we obtain the following:
|
||||
Baseline RMSE (all features selected) is ~0.195. Using tournament selection with $k=10$, $mu = 1000$, $P_C = 0.7$ and $P_M = 0.01$:
|
||||
|
||||
#image("tournament_data.png")
|
||||
|
||||
This was ran using tournament selection with $k=10$ participants and a population size of $mu = 1000$, crossover rate $P_C = 0.7$ and mutation rate $P_M = 0.01$. Switching to a roulette selection yields near identical results. No matter which of these hyperparameters I tune, the result is still near 0.1811 as shown in the graph.
|
||||
Roulette selection yields identical results. All hyperparameter configurations I've tested converge near 0.1811.
|
||||
|
||||
== Crowding
|
||||
|
||||
Lackluster results bring us to better methods. We have seen that parent selection methods tournament- and roulette selection have strong enough selection pressure to force the population to converge in the first thirty or so generations. We can use crowding, which attempts to maintain diversity through niching. Here, this is done by deterministic and probabilistic crowding, both of which are examples of explicit approaches to maintaining diversity.
|
||||
|
||||
Though sadly, I am unable to find any configuration that yields anything much better than 0.1811, even with these crowding methods. I might have implemented them wrong, or the seed I am using (42) may simply be really stuck at a certain optimum. Perhaps resource sharing could help spread out the population more.
|
||||
|
||||
We can at least measure the entropies between a simple generational GA and a crowding-based one. The following is the entropy of generational replacement, where every parent is discarded in favor of the offspring.
|
||||
Crowding maintains diversity through niching. Despite testing deterministic and probabilistic crowding, no configuration improved beyond 0.1811. The seed (42) may be stuck at a local optimum.
|
||||
|
||||
Generational replacement entropy:
|
||||
#image("tournament_entropy.png")
|
||||
|
||||
Next is the probabilistic crowding entropy
|
||||
|
||||
Probabilistic crowding entropy:
|
||||
#image("probabilistic_entropy.png")
|
||||
|
||||
Given more time, I could have plotted these in the same plot to more easily assess their differences. We can at least tell that probabilistic crowding maintains the entropy for a little longer, despite the high selection pressure from tournament selection.
|
||||
Probabilistic crowding maintains entropy longer despite high selection pressure.
|
||||
|
||||
== Elitism
|
||||
|
||||
The graphs from running with elitism versus crowding are quite similar to the previous graphs. Though I did stumble upon this entropy graph while running with probabilistic crowding and 10 elites with roulette parent selection. I also had to crank up the mutation rate to 0.02 and crossover to 0.8
|
||||
With probabilistic crowding, 10 elites, roulette selection, $P_M = 0.02$ and $P_C = 0.8$:
|
||||
|
||||
#image("lucky_entropy.png")
|
||||
|
||||
== Comparing with the Knapsack Problem
|
||||
|
||||
For the binary knapsack problem, we are maximizing.
|
||||
|
||||
Deterministic crowding approaches optimum (295246):
|
||||
#image("gen_plot.png")
|
||||
|
||||
This plot gets very close to the optimum (295246) using deterministic crowding. Then there's probabilistic crowding:
|
||||
|
||||
Probabilistic crowding has a fitness penalty bug causing negative values:
|
||||
#image("prob_plot.png")
|
||||
|
||||
There is a bug in my fitness penalty calculation, causing vastly negative fitness values.
|
||||
|
||||
= Conclusion
|
||||
|
||||
It is not that easy tweaking these hyperparameters. You have to balance the forces of selection pressure and diversity. This vaguely reminds me of Heisenberg's uncertainty principle, that a particle becomes "fuzzy" if you know how it moves, but "fixed" if you don't. I'm no physicist, though.
|
||||
Balancing selection pressure and diversity requires careful hyperparameter tuning.
|
||||
|
||||
= Further work
|
||||
|
||||
It is likely that some of these implementations are flawed, either in regards to correctness or performance or both. These issues could get ironed out. Additionally, like I mentioned already, drawing multiple plots from different configurations in the same plot could help compare. You could also script some automation of tweaking parameters and plotting/logging the data. Lastly, you could also modularize the code even more and make the algorithm more generic, allowing for even greater flexibility.
|
||||
Potential improvements include fixing implementation flaws, comparing multiple configurations in single plots, automating parameter tuning, and further modularizing the code.
|
||||
|
||||
Reference in New Issue
Block a user