Crafting AIA Developer's Guide to Machine LearningBarry S. StahlPrinciple EngineerAZNerds.net@bsstahl@cognitiveinheritance.comhttps://CognitiveInheritance.com |
|
Favorite Physicists
Other notables: Stephen Hawking, Edwin Hubble, Leonard Susskind, Christiaan Huygens |
Favorite Mathematicians
Other notables: Daphne Koller, Grady Booch, Leonardo Fibonacci, Evelyn Berezin, Benoit Mandelbrot |
|
|
|
|
|
|
...we've invented a fantastic array of tricks and gimmicks for putting together the numbers, without actually doing it. We don't actually [apply Y = Mx+B for every neuron] We do it by the tricks of mathematics, and that's all. So, we're not going to worry about that. You don't have to know about [Linear Algebra]. All you have to know is what it is, tricky ways of doing something which would be laborious otherwise.
With apologies to Professor Feynman, who was talking about the tricks of Calculus as applied to Physics, not the tricks of Linear Algebra as applied to Machine Learning.
|
|
The weight (m) often has a greater effect on the error than the bias (b) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
To find the derivative of a composite function
h(x) = f(g(x)), you take the derivative of the outer functionfwith respect to the inner functiong, and multiply it by the derivative of the inner functiongwith respect tox
If h(x) = f(g(x)), then:
dh/dx = df/dg * dg/dxHow changes in x affect the output h by accounting for how x influences g and how g influences f
Enables calculation of gradients for each layer by propagating errors backward through the network
Essential for training deep networks, as it helps adjust weights and biases to minimize prediction error
When a neural network learns the training data too well, capturing noise rather than the underlying pattern
Helps prevent gradients from becoming too small or large, aiding convergence
// Xavier/Glorot initialization for better gradient flow
int inputWeightCount = inputCount * hiddenLayerNodes;
int totalWeightCount = inputWeightCount + hiddenLayerNodes;
var weightScale = Math.Sqrt(2.0 / inputWeightCount);
startingWeights = new double[totalWeightCount];
for (int i = 0; i < startingWeights.Length; i++)
startingWeights[i] = _random.GetRandomDouble(-weightScale, weightScale);