Crafting AIA Developer's Guide to Machine LearningBarry S. StahlSolution Architect & Developer@bsstahl@cognitiveinheritance.comhttps://CognitiveInheritance.com |
![]() |
Favorite Physicists
Other notables: Stephen Hawking, Edwin Hubble, Leonard Susskind, Christiaan Huygens |
Favorite Mathematicians
Other notables: Daphne Koller, Grady Booch, Leonardo Fibonacci, Evelyn Berezin, Benoit Mandelbrot |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
The weight (m) often has a greater effect on the error than the bias (b) |
![]() |
|
![]() |
|
![]() |
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
To find the derivative of a composite function
h(x) = f(g(x)), you take the derivative of the outer functionfwith respect to the inner functiong, and multiply it by the derivative of the inner functiongwith respect tox
If h(x) = f(g(x)), then:
dh/dx = df/dg * dg/dxHow changes in x affect the output h by accounting for how x influences g and how g influences f
Enables calculation of gradients for each layer by propagating errors backward through the network
Essential for training deep networks, as it helps adjust weights and biases to minimize prediction error
When a neural network learns the training data too well, capturing noise rather than the underlying pattern
Helps prevent gradients from becoming too small or large, aiding convergence
// Xavier/Glorot initialization for better gradient flow
int inputWeightCount = inputCount * hiddenLayerNodes;
int totalWeightCount = inputWeightCount + hiddenLayerNodes;
var weightScale = Math.Sqrt(2.0 / inputWeightCount);
startingWeights = new double[totalWeightCount];
for (int i = 0; i < startingWeights.Length; i++)
startingWeights[i] = _random.GetRandomDouble(-weightScale, weightScale);