Amazing Algorithms

For Solving Problems in Software

Barry S. Stahl

Principal Engineer - AZNerds.net

@bsstahl@cognitiveinheritance.com

https://CognitiveInheritance.com

Transparent Half Width Image 800x800.png

Favorite Physicists & Mathematicians

Favorite Physicists

Harold "Hal" Stahl
Carl Sagan
Richard Feynman
Marie Curie
Nikola Tesla
Albert Einstein
Neil Degrasse Tyson
Niels Bohr
Galileo Galilei
Michael Faraday

Other notables: Stephen Hawking, Edwin Hubble, Leonard Susskind, Christiaan Huygens

Favorite Mathematicians

Ada Lovelace
Alan Turing
Johannes Kepler
Rene Descartes
Isaac Newton
Emmy Noether
George Boole
Blaise Pascal
Johann Gauss
Grace Hopper

Other notables: Daphne Koller, Grady Booch, Leonardo Fibonacci, Evelyn Berezin, Benoit Mandelbrot

Fediverse Supporter

See my presentation from CodeMash 2025: How the Fediverse Could Save Democracy & Why It Probably Won't
Hear my thoughts on why a thriving Fediverse is so critical on David Giard's Technology and Friends show.
Learn more about the Fediverse at fediverse.party.
Get your own Mastodon account at Join Mastodon.
- Don't stress on what server to use, just pick one and go. You can always move later.
Follow me on Mastodon at @bsstahl@CognitiveInheritance.com.

Some OSS Projects I Run

Liquid Victor : Media tracking and aggregation [used to assemble this presentation]
Prehensile Pony-Tail : A static site generator built in c#
TestHelperExtensions : A set of extension methods helpful when building unit tests
Conference Scheduler : A conference schedule optimizer
IntentBot : A microservices framework for creating conversational bots on top of Bot Framework
LiquidNun : Library of abstractions and implementations for loosely-coupled applications
Toastmasters Agenda : A c# library and website for generating agenda's for Toastmasters meetings
ProtoBuf Data Mapper : A c# library for mapping and transforming ProtoBuf messages

http://GiveCamp.org

Achievement Unlocked

Agenda

08:00 - 08:15 : Intro and Modeling
08:15 - 08:45 : Dynamic Programming
08:45 - 09:15 : Genetic Algorithms
09:15 - 09:45 : K-Means Clustering
09:45 - 10:00 : Break
10:00 - 10:15 : Clustering Wrap-up
10:15 - 10:45 : Best-Path Algorithms
10:45 - 11:15 : Cost-Minimization Algorithms
11:15 - 11:45 : Training Neural Networks
11:45 - 12:00 : Q&A and Wrap-up
Bonus (if time allows) : LP & MIP

Simulates A Rational Actor

Attempts to make the best possible decision based on the model and the data

Model - Our description of the problem
Data - Our understanding of the state of the domain

Models are the Blueprint

The success of a problem-solving algorithm is often determined before the algorithm even runs -- by how the problem is modeled

Modeling

Modeling defines the problem space
- What’s included & what’s ignored
- Poor models lead to irrelevant solutions

Modeling shapes the solution strategy
- Logical models » OOP or Rules Engines
- Graph models » traversal algorithms
- Constraint models » LP or MIP
- Probabilistic models » Neural/Bayesian Networks

Modeling can determine tractability
- Some formulations make problems NP-hard
- Others reveal polynomial-time solutions
- Examples
  - Conference Scheduling with LP vs MIP
  - Traveling Salesman with OOP

Workshop Goals

Describe these algorithms
- Show models that work well with each type of problem
- Describe sample implementations for your use
- Provide intuition as to why they work as they do

Knapsack Problems

A family of combinatorial optimization problems
Goal
- Select a subset of items from a larger set
- Without exceeding resource constraints
- Such that total value is maximized

Memoization

Store the results of function calls for reuse when the same inputs occur again

Goal: Compute the nth Fibonacci number
- F(n) = F(n-1) + F(n-2)
Naïve Recursion
- Recomputes each subproblem multiple times
- O(2ⁿ) Problem - Exponential time
With Memoization
- Store results of F(k)
- Reuse stored values
- O(n) Problem - Linear Time

Canonical Knapsack Problem

Memoization

Dynamic Programming

A mathematical optimization technique

Best for solving problems that can be described recursively
Breaks the problem into smaller sub-problems
Solves each small problem only once
Guarantees an optimal solution

Optimization Problems

Find the best values for a set of decision variables that
- Satisfies all constraints
- Maximizes the features we want
- Minimizes the features we don’t want

Simple Example - Chutes and Ladders

Chutes and Ladders - Greedy Algorithm

Start at the 1st space on the board
While we haven't reached the last space
- Add the space to the current path
- If the current space has a ladder
  - Add the end of the ladder to the path
  - Make that space the current space
- Move to the next space on the board
Return the completed path through the board

Dynamic Programming - Step 1

DetermineDistance(s,d)

Call DetermineDistance(space[0], 0)
If shorter than previously found
- Set DistanceFromStart property
- If this space starts a chute or ladder
  - DetermineDistance(endOfNavigation, distanceFromStart+1)
- If not at the end of board
  - DetermineDistance(nextSpace, distanceFromStart+1)
Return all spaces with their DistanceFromStart property populated

Dynamic Programming - Step 2

Start at the last space on the board
While we are still on the board
- Add the current space to the path
- Set the current space to a space that is a neighbor of the current space and whose DistanceFromStart is space.DistanceFromStart-1
Reverse the direction of the path

Memoization

Shortest Path

Summary - Dynamic Programming

Populate the cache
- Simple calculations
- Build from the ground-up
Use the cached data to determine the answer(s)
- Work backwards through the cache
Guaranteed Optimality
- A fully populated cache means all options are explored
- Works in any number of dimensions
Works with any graph (nodes & edges)
- Works best when
  - Problems can be described recursively
  - 1 or more axis are limited in scope

Use Case: Game Strategy

Find the best strategy in a multi-player board game

Variant of Chutes & Ladders
- Player decides whether or not to take a chute or ladder
Extremely large Solution Space
- Impossible to simply try all options

Other Model Types

Shortest Path Algorithm
- Guarantees shortest path but not necessarily best strategy

Mixed-Integer Programming Model
- Requires a mathematical fitness function (can't use a simulation)

Neural Network
- Requires a very large training data corpus

Genetic Algorithms

Find Solutions by Simulating Darwinian Evolution

Each candidate solution is defined by its properties (chromosomes)
A fitness function is used to determine which solutions "survive"
- A simulation may be used as the fitness function
Surviving solutions may mutate and evolve other solutions

Optimality is Never Guaranteed

Optimal solution may never be found
- i.e. If a optimality requires a large combination of elements
Optimum may be found but not recognized
- If conditions make that solution non-optimal at that moment
  - i.e. If a key feature is not used during the simulation
- Minimize the liklihood by increasing simulation size

Determine Our Game Strategy

We need a strategy that will tell us where we

should move to next, given:
- Our location on the board
- The result of our spin
- The location of our opponents (opt)
- Our player number (opt)
Example: If we are on square 45 and spin a:
- 1 - 46 (no choice)
- 2 - 47 (no choice)
- 3 - 48 or 26
- 4 - 49 or 27
- 5 - 50, 11 or 28
- 6 - 51, 12, 29 or 84

Step 1 - Define DNA

A chromosome represents all of the possible decisions that can be made
Each gene represents a decision
- Ideally a gene fully describes a single game state
- If no decision is possible for a state, there is no gene for that state
An allele is an option for that decision
- If there are 3 options for a given state there will be 3 possible alleles for that gene

DNA of Chutes & Ladders

Step 2 - Setup Parameters

Simulations per Generation
- Controls how often the solutions evolve
- More simulations make it more likely the best solution will survive

Generation Count
- Controls simulation length
- Optionally look for convergence
  - When does improvement stop?
  - When do significant changes stop during crossover?

Error (misspelling) rate
- Controls how frequently genes change allele's during the evolutionary process
- Higher rates cause greater variations from generation to generation

Selection strategy
- Defines which solutions survive and mutate
  - Survive intact - Remain unchanged to next generation
  - Survive mutated - Modified versions survive to next generations
  - Survive recombined - Crossover between two solutions survives

Genetic Algorithm

Step 3 - Run the Simulations

Options for Experimentation

Implement in your preferred language
- C# implementation is on GitHub
Augment to include what place the player is in
- i.e. Might make different decisions if started 1^st vs 6^th
Augment to include opponent locations
- i.e. Might make different decisions if opponents are close to the end
Define the DNA of another game
- Recommendation - stick with simpler games to start
- Avoid games like chess with massive complexity

Problem 2 - Semantic Distance

Embeddings

A point in multi-dimensional space
Mathematical representation of a word or phrase
Encode both semantic and contextual information

Model: text-embedding-ada-002
Vectors normalized to unit length
Use 1536 dimensions

3-D Space Projected into 2-D

Cosine Similarity & Distance

Relate vectors based on the angle between them

Cosine Similarity ranges from -1 to 1, where:
- +1 indicates that the vectors represent similar semantics & context
- 0 indicates that the vectors are orthogonal (no similarity)
- -1 indicates that the vectors have opposing semantics & context
Cosine Distance is defined as 1 - cosine similarity where:
- 0 = Synonymous
- 1 = Orthogonal
- 2 = Antonymous

Note: For normalized vectors, cosine similarity is the same as the dot-product

Cosine Distance

Embedding Distance

Feature	Example
Synonym	"Happy" is closer to "Joyful" than to "Sad"
Language	"The Queen" is very close to "La Reina"
Idiom	"He kicked the bucket" is closer to "He died" than to "He kicked the ball"
Sarcasm	"Well, look who's on time" is closer to "Actually Late" than "Actually Early"
Homonym	"Bark" (dog sound) is closer to "Howl" than to "Bark" (tree layer)
Collocation	"Fast food" is closer to "Junk food" than to "Fast car"
Proverb	"The early bird catches the worm" is closer to "Success comes to those who prepare well and put in effort" than to "A bird in the hand is worth two in the bush"
Metaphor	"Time is money" is closer to "Don't waste your time" than to "Time flies"
Simile	"He is as brave as a lion" is closer to "He is very courageous" than to "He is a lion"

Modeling Exercise

Design a system to find words/phrases that are far apart (semantically different) in the embedding space?

Assumptions
- No db of embeddings available to search
- Limited understanding of the features of the embedding space
  - i.e. changing language changes nearly every dimension
What type of models might work?
- Can we do it logically?
- Can we brute-force it?
- Are there hybrid approaches?
What are the features of each model?
How do we execute the model(s)?

Possible Genes

Categorical
- Part of Speech (noun, verb, etc)
- Register (formal, slang, technical, etc)
- Morphology (root, compound, phrase)
- Animacy (inanimate, animate, agentic, collaborative)
- Scientific Discipline (biology, physical, life/health, etc)
Continuous
- Polarity (negative -> positive)
- Idiomacity (literal -> idiomatic)
- Concreteness (abstract -> concrete)
Other Options
- Language

Mapping Strategies

We need to find what should change in order to identify a word distant from the current one

Part of Speech: Noun -> Adverb
Discipline: Software Engineering -> Philosophy
Polarity: Neutral -> Highly Positive

Linguistic Agent - System Prompt

You are a simulation of a great linguist. You classify words/phrases within the following categories:

Part of Speech (noun, verb, etc)
Register (formal, slang, technical, etc)
Scientific Discipline (biology, physical, life/health, etc)
Morphology (root, compound, phrase)
Animacy (inanimate, animate, agentic, collaborative)
Polarity (highly negative, negative, neutral, positive, highly positive)
Idiomacity (highly literal, literal, idiomatic, highly idiomatic)
Concreteness (highly abstract, abstract, concrete, highly concrete)

Your job is to identify words/phrases that meet the requested characteristics. You respond only with the requested word or phrase in lower case.

Linguistic Agent - User Prompt

Give me a word or phrase with the following characteristics:

Primary part of speech is verb
Register is technical
Morphology is phrase
Animacy is collaborative
Scientific Discipline is biology
Polarity is highly negative
Idiomacity idiomatic
Concreteness highly concrete

and is not in the following list: "prime genius", "engineer a cooperative symbiosis".

Be sure to only respond with the selected word or phrase, no ceremony.

Commonalities

We discussed 2 problems:

Game strategy - what's the best move given game state?
Semantic Distance - what word should I test next?

What do they have in common? What makes them good candidates for genetic algorithms?

Summary

Genetic Algorithms "Learn" by simulating Darwinian Evolution
The "DNA" represents the decisions that can be made
- One chromosome per condition (game state)
- One gene per option (choices from a state)
Several parameters control how solutions evolve
- Simulations per Generation - how often the solutions change
  - Fewer simulations make it more likely a less-fit solution will survive over a more-fit solution
- Error (misspelling) rate - how often each chromosome changes between generations
  - Higher rates cause greater variations from generation to generation
- Selection strategy - Which solutions survive and mutate
  - Survive intact - Remain unchanged to next generation
  - Survive mutated - Modified versions survive to next generations
  - Survive recombined - Crossover between two solutions survives

Bio-Inspired Algorithms

Best Path Algorithms
- Ant Colony Optimization
- Bee Colony Optimization
Cost Reduction Algorithms
- Firefly Optimization
- Amoeba Optimization
AI Models
- Intro to Optimizing AI Models
- Training an AI Model Using Amoebas

Best Path Algorithms

Ant Colony Optimization

Pheromones

Greater Usage => More Pheromones
Shorter Path Length => More Pheromones Remain
More Pheromones => Higher Probability of Usage

Ant Colony Optimization

Neighborhood Search
- Start with random paths
- Explore neighboring paths based on probability
- More pheromone = Higher probability of use
- Can get stuck in a local minima
Tweaks
- Number of ants
- Pheromones to dispense
- Pheromone dissipation rate
Features
- Less sensitive to scale
- Optimality not guaranteed

Bee Colony Optimization

Types of Bees

Active Workers: Forage on their known path and on a neighboring path
Scouts: Forage on a random path
Inactive Workers: Wait for information from other bees

Bee Colony Optimization

Multiple Search types
- Active Workers perform neighborhood search
- Scouts perform random search
- Best known path propogates
Tweaks
- Number of each type of bee
- Probability of persuasion
- Visit limit
Features
- Less Sensitive to Scale
- Optimality not guaranteed

Reducing Costs

Firefly Optimization

initialize n fireflies to random positions
loop maxEpochs times
  for each firefly i
    for each firefly j
      if intensity(i) < intensity(j)
        compute attractiveness
        move firefly(i) toward firefly(j)
        update firefly(i) intensity
      end for
    end for
  sort fireflies
end loop
return best position found

Firefly Optimization

Neighborhood Search
- Search the area toward best known solution
- Closer and Brighter: More Attractive
  - Similar to gravity
Tweaks
- Number of fireflies
- Range of possible values
- How fast fireflies move
Features
- Works better for linear problems
- Optimality not guaranteed
- Can suffer from sparseness problems

Amoeba Optimization

initialize the amoeba with n (size) locations
loop maxEpochs times
    calculate new possible solutions
        contracted - midway between centroid and worst
        reflected - contracted point reflected across centroid
        expanded - beyond reflected point by a constant factor
    if any solution is better than the current
        replace worst value with best value from new solution
    else
        shrink (multiple contract) all lesser nodes toward the best
    increment epoch count
end loop
return best position found

Amoeba Optimization

Neighborhood Search
- Start with random locations
- Explore neighbors based on amoeba movement
- Surround the solution, then contract to it
Tweaks
- Size of the amoeba
  - > # of search dimensions
- Number of executions
  - Handle local minima
Features
- Optimality no guaranteed
- Can suffer from sparseness problems

What Can We Do With This Stuff?

Gradient Descent

Optimization algorithm used to minimize the cost function of a model

Shifts the parameters in the opposite direction of the cost gradient
- The 1st derivative of the cost function
- Represents the slope of that function
Hyperparameters include:
- Number of iterations
- Learning rate
- Convergence criteria
Stochastic Gradient Descent
- Updates parameters using a subset of data each cycle

AI Models - Gradient Descent - 400x800.png

Deep Neural Networks

Linear Model

Component	Details
Input Variable	X
Output Variable	Y
Weight Parameter	M
Bias Parameter	B
Linear Equation	Y=mX+b

AI Models - Single Input Linear - 1280x720.png

ML's Linear Linchpin

Y = mX + B

Every neuron gets its value from a linear transformation
Multiple inputs result in a sum of the linear transformations
- The sum of a linear transformation is linear
Only linearly-separable functions can be modeled without a non-linear activation

AI Models - Annotated Single Input Linear - 640x360.png

Model Parameters

Parameters: Internal values learned during training
- Define the relationship between input and output
- Adjusted to minimize prediction error
In Linear Regression: Y = mX + b
- m: Slope (Weight) - The influence of X on Y
- b: Intercept (Bias) - Shifts the output vertically
In more complex models:
- Counts include weights/biases across layers
- Can capture non-linear relationships
- Parameter counts are a proxy for complexity
- GPT-4 uses roughly 1.8 trillion parameters

Error Function

X-Axis: The value of the weight (m)
Y-Axis: The value of the bias (b)
Z-Axis: The size of the error

The weight (m) often has a greater effect on the error than the bias (b)

Linear Regression - Error Function Plot - Smaller.png

Training the Model

Objective: Minimize the error
- Error: The difference between the predicted value and the actual value
- MSE: Mean Squared Error - Avg of the squared differences between predicted and actual
Means: Gradient Descent Optimization
- Nearly any Optimization algorithm can be used
- Gradient Descent is the most common/suited

Linear Regression Demos

Linear Regression - C# Code
AND Gate Model - Excel
Maintainability Model - Excel

Mathematics

...we've invented a fantastic array of tricks and gimmicks for putting together the numbers, without actually doing it. We don't actually [apply Y = Mx+B for every neuron] We do it by the tricks of mathematics, and that's all. So, we're not going to worry about that. You don't have to know about [Linear Algebra]. All you have to know is what it is, tricky ways of doing something which would be laborious otherwise.

With apologies to Professor Feynman, who was talking about the tricks of Calculus as applied to Physics, not the tricks of Linear Algebra as applied to Machine Learning.

Railroad Times Model

Linear Regression Model

Predict the unknown values in a linear equation

Given X (time), predict Y (location)
- Y = mX + b
Find the best values for m and b
- Minimize total error

Voter Model

ML Demo

Using Amoeba Optimzation to Train an ML Model

More Bio-Inspired

Roach Infestation Optimization
Particle Swarm Optimization
Multi-Swarm (Birds) Optimization
Bacterial Foraging Optimization

Solving Tractable Problems

Understand the Problem Deeply
- Clarify inputs, outputs, constraints, and goals
- Is optimality required? Can constraints be relaxed?
- Identify if it can be broken into reusable parts?
Classify the Problem Type
- Search? Optimization? Graph traversal? Dynamic programming candidate?
- Are brute-force or exponential-time solutions feasible?
- What tools do we have that can help? Can we buy vs build?
Implement and Test
- Try a naive solution 1st
- Use test cases to validate correctness and performance
- Optimize as needed / Start over if necessary

Resources - Page 1

Code Samples
My Articles/Presentations

Amazing Algorithms - Short Workshop - QR.png

Resources - Page 2

Academic Resources
- Optimization - Coursera Course
- Clemson University
Other Articles

Solving Your Problems

Would you like to try to model a problem from one of your domains?

Bonus Content

Linear Programming (LP) & Mixed-Integer Programming (MIP)

Use Case: Production Targets

Products
- Small Vase
  - 1 oz. clay, 1 oz. glaze
  - Sells for $3.00 each
- Large Vase
  - 4 oz. clay, 2 oz. glaze
  - Sells for $9.00 each

Inventory
- Clay - 24 oz.
- Glaze - 16 oz.

Goal
- Maximize Revenue

Solution Space

Constraint Equations

Clay Constraint
- X + 4Y <= 24
  - X <= 24 and Y <= 6
Glaze Constraint
- X + 2Y <= 16
  - X <=16 and Y <= 8

Feasible Region

Solution Space Graph - With Feasible Solutions and Constraint Lines.png

Linear Programming - Polytope

Solution Space Graph - With Polytope.png

Linear and Mixed-Integer Programming

Define Constraints that the Solution Must Satisfy
- Use these constraints to limit the search space
  - More constraints = smaller search space
    - Solutions that exist are found faster
    - May not find a feasible solution

Add an Objective Function to Improve Solutions
- All constraints must be satisfied first
- Objectives are a goal relative to an equation
  - i.e. Maximize revenue where r = 3X + 9Y

Search/Optimization Models

Amazing Algorithms

For Solving Problems in Software

Barry S. Stahl

Principal Engineer - AZNerds.net

@bsstahl@cognitiveinheritance.com

https://CognitiveInheritance.com

Favorite Physicists & Mathematicians

Favorite Physicists

Favorite Mathematicians

Fediverse Supporter

Some OSS Projects I Run

http://GiveCamp.org

Achievement Unlocked

Agenda

Simulates A Rational Actor

Models are the Blueprint

Modeling

Workshop Goals

Knapsack Problems

Memoization

Canonical Knapsack Problem

Memoization

Memoization

Memoization

Memoization

Memoization

Memoization

Memoization

Memoization

Memoization

Memoization

Memoization

Memoization

Memoization

Memoization

Memoization

Memoization

Memoization

Memoization

Memoization

Memoization

Memoization

Memoization

Memoization

Dynamic Programming

A mathematical optimization technique

Optimization Problems

Simple Example - Chutes and Ladders

Chutes and Ladders - Greedy Algorithm

Dynamic Programming - Step 1

DetermineDistance(s,d)

Dynamic Programming - Step 2

Memoization

Shortest Path

Shortest Path

Shortest Path

Shortest Path

Shortest Path

Shortest Path

Shortest Path

Shortest Path

Shortest Path

Shortest Path

Shortest Path

Shortest Path

Summary - Dynamic Programming

Use Case: Game Strategy

Find the best strategy in a multi-player board game​

Other Model Types

Genetic Algorithms

Find Solutions by Simulating Darwinian Evolution​

Optimality is Never Guaranteed

Determine Our Game Strategy

Step 1 - Define DNA

DNA of Chutes & Ladders

Step 2 - Setup Parameters

Genetic Algorithm

Step 3 - Run the Simulations

Options for Experimentation

Problem 2 - Semantic Distance

Find the best strategy in a multi-player board game

Find Solutions by Simulating Darwinian Evolution

Predict the unknown values in a linear equation