Solutions that Evolve

Building Self-Improving Systems

with Genetic Algorithms

Barry S. Stahl

Principal Engineer - AZNerds.net

@bsstahl@cognitiveinheritance.com

https://CognitiveInheritance.com

Transparent Half Width Image 800x800.png

Favorite Physicists & Mathematicians

Favorite Physicists

Harold "Hal" Stahl
Carl Sagan
Richard Feynman
Marie Curie
Nikola Tesla
Albert Einstein
Neil Degrasse Tyson
Niels Bohr
Galileo Galilei
Michael Faraday

Other notables: Stephen Hawking, Edwin Hubble, Leonard Susskind, Christiaan Huygens

Favorite Mathematicians

Ada Lovelace
Alan Turing
Johannes Kepler
Rene Descartes
Isaac Newton
Emmy Noether
George Boole
Blaise Pascal
Johann Gauss
Grace Hopper

Other notables: Daphne Koller, Grady Booch, Leonardo Fibonacci, Evelyn Berezin, Benoit Mandelbrot

Fediverse Supporter

See my presentation from CodeMash 2025: How the Fediverse Could Save Democracy & Why It Probably Won't
Hear my thoughts on why a thriving Fediverse is so critical on David Giard's Technology and Friends show.
Learn more about the Fediverse at fediverse.party.
Get your own Mastodon account at Join Mastodon.
- Don't stress on what server to use, just pick one and go. You can always move later.
Follow me on Mastodon at @bsstahl@CognitiveInheritance.com.

Some OSS Projects I Run

Liquid Victor : Media tracking and aggregation [used to assemble this presentation]
Prehensile Pony-Tail : A static site generator built in c#
TestHelperExtensions : A set of extension methods helpful when building unit tests
Conference Scheduler : A conference schedule optimizer
IntentBot : A microservices framework for creating conversational bots on top of Bot Framework
LiquidNun : Library of abstractions and implementations for loosely-coupled applications
Toastmasters Agenda : A c# library and website for generating agenda's for Toastmasters meetings
ProtoBuf Data Mapper : A c# library for mapping and transforming ProtoBuf messages

http://GiveCamp.org

Achievement Unlocked

Use Case: Game Strategy

Find the best strategy in a multi-player board game

Variant of Chutes & Ladders
- Player decides whether or not to take a chute or ladder
Extremely large Solution Space
- Impossible to simply try all options

Other Model Types

Shortest Path Algorithm
- Guarantees shortest path but not necessarily best strategy

Mixed-Integer Programming Model
- Requires a mathematical fitness function (can't use a simulation)

Neural Network
- Requires a very large training data corpus

Genetic Algorithms

Find Solutions by Simulating Darwinian Evolution

Each candidate solution is defined by its properties (chromosomes)
A fitness function is used to determine which solutions "survive"
- A simulation may be used as the fitness function
Surviving solutions may mutate and evolve other solutions

Optimality is Never Guaranteed

Optimal solution may never be found
- i.e. If a optimality requires a large combination of elements
Optimum may be found but not recognized
- If conditions make that solution non-optimal at that moment
  - i.e. If a key feature is not used during the simulation
- Minimize the liklihood by increasing simulation size

Determine Our Game Strategy

We need a strategy that will tell us where we

should move to next, given:
- Our location on the board
- The result of our spin
- The location of our opponents (opt)
- Our player number (opt)
Example: If we are on square 45 and spin a:
- 1 - 46 (no choice)
- 2 - 47 (no choice)
- 3 - 48 or 26
- 4 - 49 or 27
- 5 - 50, 11 or 28
- 6 - 51, 12, 29 or 84

Step 1 - Define DNA

A chromosome represents all of the possible decisions that can be made
Each gene represents a decision
- Ideally a gene fully describes a single game state
- If no decision is possible for a state, there is no gene for that state
An allele is an option for that decision
- If there are 3 options for a given state there will be 3 possible alleles for that gene

DNA of Chutes & Ladders

Step 2 - Setup Parameters

Simulations per Generation
- Controls how often the solutions evolve
- More simulations make it more likely the best solution will survive

Generation Count
- Controls simulation length
- Optionally look for convergence
  - When does improvement stop?
  - When do significant changes stop during crossover?

Error (misspelling) rate
- Controls how frequently genes change allele's during the evolutionary process
- Higher rates cause greater variations from generation to generation

Selection strategy
- Defines which solutions survive and mutate
  - Survive intact - Remain unchanged to next generation
  - Survive mutated - Modified versions survive to next generations
  - Survive recombined - Crossover between two solutions survives

Genetic Algorithm

Step 3 - Run the Simulations

Options for Experimentation

Implement in your preferred language
- C# implementation is on GitHub
Augment to include what place the player is in
- i.e. Might make different decisions if started 1^st vs 6^th
Augment to include opponent locations
- i.e. Might make different decisions if opponents are close to the end
Define the DNA of another game
- Recommendation - stick with simpler games to start
- Avoid games like chess with massive complexity

Problem 2 - Semantic Distance

Embeddings

A point in multi-dimensional space
Mathematical representation of a word or phrase
Encode both semantic and contextual information

Model: text-embedding-ada-002
Vectors normalized to unit length
Use 1536 dimensions

3-D Space Projected into 2-D

Cosine Similarity & Distance

Relate vectors based on the angle between them

Cosine Similarity ranges from -1 to 1, where:
- +1 indicates that the vectors represent similar semantics & context
- 0 indicates that the vectors are orthogonal (no similarity)
- -1 indicates that the vectors have opposing semantics & context
Cosine Distance is defined as 1 - cosine similarity where:
- 0 = Synonymous
- 1 = Orthogonal
- 2 = Antonymous

Note: For normalized vectors, cosine similarity is the same as the dot-product

Cosine Distance

Embedding Distance

Feature	Example
Synonym	"Happy" is closer to "Joyful" than to "Sad"
Language	"The Queen" is very close to "La Reina"
Idiom	"He kicked the bucket" is closer to "He died" than to "He kicked the ball"
Sarcasm	"Well, look who's on time" is closer to "Actually Late" than "Actually Early"
Homonym	"Bark" (dog sound) is closer to "Howl" than to "Bark" (tree layer)
Collocation	"Fast food" is closer to "Junk food" than to "Fast car"
Proverb	"The early bird catches the worm" is closer to "Success comes to those who prepare well and put in effort" than to "A bird in the hand is worth two in the bush"
Metaphor	"Time is money" is closer to "Don't waste your time" than to "Time flies"
Simile	"He is as brave as a lion" is closer to "He is very courageous" than to "He is a lion"

Modeling Exercise

Design a system to find words/phrases that are far apart (semantically different) in the embedding space?

Assumptions
- No db of embeddings available to search
- Limited understanding of the features of the embedding space
  - i.e. changing language changes nearly every dimension
What type of models might work?
- Can we do it logically?
- Can we brute-force it?
- Are there hybrid approaches?
What are the features of each model?
How do we execute the model(s)?

Possible Genes

Categorical
- Part of Speech (noun, verb, etc)
- Register (formal, slang, technical, etc)
- Morphology (root, compound, phrase)
- Animacy (inanimate, animate, agentic, collaborative)
- Scientific Discipline (biology, physical, life/health, etc)
Continuous
- Polarity (negative -> positive)
- Idiomaticity (literal -> idiomatic)
- Concreteness (abstract -> concrete)
Other Options
- Language

Mapping Strategies

We need to find what should change in order to identify a word distant from the current one

Part of Speech: Noun -> Adverb
Discipline: Software Engineering -> Philosophy
Polarity: Neutral -> Highly Positive

Linguistic Agent - System Prompt

You are a simulation of a great linguist. You classify words/phrases within the following categories:

Part of Speech (noun, verb, etc)
Register (formal, slang, technical, etc)
Scientific Discipline (biology, physical, life/health, etc)
Morphology (root, compound, phrase)
Animacy (inanimate, animate, agentic, collaborative)
Polarity (highly negative, negative, neutral, positive, highly positive)
Idiomaticity (highly literal, literal, idiomatic, highly idiomatic)
Concreteness (highly abstract, abstract, concrete, highly concrete)

Your job is to identify words/phrases that meet the requested characteristics. You respond only with the requested word or phrase in lower case.

Linguistic Agent - User Prompt

Give me a word or phrase with the following characteristics:

Primary part of speech is verb
Register is technical
Morphology is phrase
Animacy is collaborative
Scientific Discipline is biology
Polarity is highly negative
Idiomaticity idiomatic
Concreteness highly concrete

and is not in the following list: "prime genius", "engineer a cooperative symbiosis".

Be sure to only respond with the selected word or phrase, no ceremony.

What Else Can We Do with Our Linguistic Genome?

Once we can encode the traits of a word/phrase/idea, what other fitness functions could we apply?

The DNA defines the traits of the specific organism (candidate)
The Fitness Function defines the direction of evolution

Linguistic Genome Experiments

Phonetics - Alliteration, Rhyme, or Assonance
Rhythm - Punchy words or flowing patterns
Imagery - Reward evocations of vivid emotional imagery
Narrative - How “heroic”, “mysterious”, or “pastoral”
Temporal - Prefer archaic, modern, or futuristic language
Novelty - Reward surprise or pattern-breaking
Constraint - Avoid forbidden categories or styles
Information-Density - Pack more meaning per syllable

Genetic Systems

Commonalities

We discussed 2 problems:

Game strategy - what's the best move given game state?
Semantic Distance - what word should I test next?

What do they have in common? What makes them good candidates for genetic algorithms?

Summary

Genetic Algorithms "Learn" by simulating Darwinian Evolution
The "DNA" represents the decisions that can be made
- One chromosome per condition (game state)
- One gene per option (choices from a state)
Several parameters control how solutions evolve
- Simulations per Generation - how often the solutions change
  - Fewer simulations make it more likely a less-fit solution will survive over a more-fit solution
- Error (misspelling) rate - how often each chromosome changes between generations
  - Higher rates cause greater variations from generation to generation
- Selection strategy - Which solutions survive and mutate
  - Survive intact - Remain unchanged to next generation
  - Survive mutated - Modified versions survive to next generations
  - Survive recombined - Crossover between two solutions survives

Resources

Appendix

Simulates A Rational Actor

Attempts to make the best possible decision based on the model and the data

Model - Our description of the problem
Data - Our understanding of the state of the domain

Models are the Blueprint

The success of a problem-solving algorithm is often determined before the algorithm even runs -- by how the problem is modeled

Modeling

Modeling defines the problem space
- What’s included & what’s ignored
- Poor models lead to irrelevant solutions

Modeling shapes the solution strategy
- Logical models » OOP or Rules Engines
- Graph models » traversal algorithms
- Constraint models » LP or MIP
- Probabilistic models » Neural/Bayesian Networks

Modeling can determine tractability
- Some formulations make problems NP-hard
- Others reveal polynomial-time solutions
- Examples
  - Conference Scheduling with LP vs MIP
  - Traveling Salesman with OOP

Solutions that Evolve

Building Self-Improving Systems

with Genetic Algorithms

Barry S. Stahl

Principal Engineer - AZNerds.net

@bsstahl@cognitiveinheritance.com

https://CognitiveInheritance.com

Favorite Physicists & Mathematicians

Favorite Physicists

Favorite Mathematicians

Fediverse Supporter

Some OSS Projects I Run

http://GiveCamp.org

Achievement Unlocked

Use Case: Game Strategy

Find the best strategy in a multi-player board game​

Other Model Types

Genetic Algorithms

Find Solutions by Simulating Darwinian Evolution​

Optimality is Never Guaranteed

Determine Our Game Strategy

Step 1 - Define DNA

DNA of Chutes & Ladders

Step 2 - Setup Parameters

Genetic Algorithm

Step 3 - Run the Simulations

Options for Experimentation

Problem 2 - Semantic Distance

Embeddings

3-D Space Projected into 2-D

Cosine Similarity & Distance

Cosine Distance

Cosine Distance

Embedding Distance

Modeling Exercise

Design a system to find words/phrases that are far apart (semantically different) in the embedding space?

Possible Genes

Mapping Strategies

Linguistic Agent - System Prompt

Linguistic Agent - User Prompt

What Else Can We Do with Our Linguistic Genome?

Linguistic Genome Experiments

Genetic Systems

Commonalities

Summary

Resources

Appendix

Simulates A Rational Actor

Models are the Blueprint

Modeling

Find the best strategy in a multi-player board game

Find Solutions by Simulating Darwinian Evolution