92 lines
3.4 KiB
Typst
92 lines
3.4 KiB
Typst
#import "@preview/simple-ntnu-report:0.1.2": *
|
|
|
|
#show: ntnu-report.with(
|
|
length: "short",
|
|
title: "Assignment 1 -- Report",
|
|
subtitle: "IT3708 -- Bio-Inspired Artificial Intelligence",
|
|
authors: ((name: "Fredrik Robertsen"),),
|
|
// front-image: image("ntnu.png", width: 6cm),
|
|
date: datetime(year: 2026, month: 2, day: 9),
|
|
language: "english",
|
|
bibfile: bibliography("ref.bib"),
|
|
bibstyle: "institute-of-electrical-and-electronics-engineers",
|
|
column-number: 2,
|
|
number-headings: false,
|
|
show-toc: true,
|
|
)
|
|
|
|
Source code available at: \
|
|
https://git.pvv.ntnu.no/frero-uni/IT3708
|
|
|
|
= Introduction
|
|
|
|
The binary knapsack problem and its derivations like feature selection are NP-hard, meaning no polynomial-time solution exists. Genetic algorithms can approximate solutions by tweaking hyperparameters. This report details solving the knapsack problem and feature selection.
|
|
|
|
= Background & Setup
|
|
|
|
The binary knapsack problem finds the optimal combination of items to maximize value without exceeding weight capacity. Without the capacity limit, this becomes feature selection, relevant to regression and machine learning.
|
|
|
|
Genetic algorithms approximate solutions by finding global optima in a search space. Items or features are encoded as bits in a bitstring representing individuals/chromosomes. Starting with a population, we repeat until satisfied:
|
|
1. select parents from population;
|
|
2. perform crossover with pairs of parents;
|
|
3. mutate offspring.
|
|
|
|
I implemented a modular genetic algorithm in over a thousand lines of #text(blue)[#link("https://odin-lang.org/")[Odin]] code with memoized fitness values for performance. #text(blue)[#link("https://uiua.org/")[Uiua]] handles plotting.
|
|
|
|
= Results & Reflection
|
|
|
|
== Running the algorithm
|
|
|
|
Initial output with random parent selection, 70% single-point crossover, 1% bit-flip mutations and full generational replacement:
|
|
|
|
```
|
|
Baseline RMSE: 0.1952
|
|
Gen 0: Best=0.1885 Mean=0.1957 Worst=0.2062 Entropy=49.5620
|
|
...
|
|
Gen 99: Best=0.1920 Mean=0.1974 Worst=0.2088 Entropy=40.3545
|
|
```
|
|
|
|
Results worsened because random parent selection doesn't approach optima @pavlic2026ga_hyperparameters.
|
|
|
|
== Best and Worst RMSE
|
|
|
|
Baseline RMSE (all features selected) is ~0.195. Using tournament selection with $k=10$, $mu = 1000$, $P_C = 0.7$ and $P_M = 0.01$:
|
|
|
|
#image("tournament_data.png")
|
|
|
|
Roulette selection yields identical results. All hyperparameter configurations I've tested converge near 0.1811.
|
|
|
|
== Crowding
|
|
|
|
Crowding maintains diversity through niching. Despite testing deterministic and probabilistic crowding, no configuration improved beyond 0.1811. The seed (42) may be stuck at a local optimum.
|
|
|
|
Generational replacement entropy:
|
|
#image("tournament_entropy.png")
|
|
|
|
Probabilistic crowding entropy:
|
|
#image("probabilistic_entropy.png")
|
|
|
|
Probabilistic crowding maintains entropy longer despite high selection pressure.
|
|
|
|
== Elitism
|
|
|
|
With probabilistic crowding, 10 elites, roulette selection, $P_M = 0.02$ and $P_C = 0.8$:
|
|
|
|
#image("lucky_entropy.png")
|
|
|
|
== Comparing with the Knapsack Problem
|
|
|
|
Deterministic crowding approaches optimum (295246):
|
|
#image("gen_plot.png")
|
|
|
|
Probabilistic crowding has a fitness penalty bug causing negative values:
|
|
#image("prob_plot.png")
|
|
|
|
= Conclusion
|
|
|
|
Balancing selection pressure and diversity requires careful hyperparameter tuning.
|
|
|
|
= Further work
|
|
|
|
Potential improvements include fixing implementation flaws, comparing multiple configurations in single plots, automating parameter tuning, and further modularizing the code.
|