{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Previous Class Definitions\n", "The previously defined Layer_Dense, Activation_ReLU, Activation_Softmax, Loss, and Loss_CategoricalCrossEntropy classes." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# imports\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import nnfs\n", "from nnfs.datasets import spiral_data, vertical_data\n", "nnfs.init()" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "class Layer_Dense:\n", " def __init__(self, n_inputs, n_neurons):\n", " # Initialize the weights and biases\n", " self.weights = 0.01 * np.random.randn(n_inputs, n_neurons) # Normal distribution of weights\n", " self.biases = np.zeros((1, n_neurons))\n", "\n", " def forward(self, inputs):\n", " # Calculate the output values from inputs, weights, and biases\n", " self.output = np.dot(inputs, self.weights) + self.biases # Weights are already transposed\n", "\n", "class Activation_ReLU:\n", " def forward(self, inputs):\n", " self.output = np.maximum(0, inputs)\n", " \n", "class Activation_Softmax:\n", " def forward(self, inputs):\n", " # Get the unnormalized probabilities\n", " # Subtract max from the row to prevent larger numbers\n", " exp_values = np.exp(inputs - np.max(inputs, axis=1, keepdims=True))\n", "\n", " # Normalize the probabilities with element wise division\n", " probabilities = exp_values / np.sum(exp_values, axis=1,keepdims=True)\n", " self.output = probabilities\n", "\n", "# Base class for Loss functions\n", "class Loss:\n", " '''Calculates the data and regularization losses given\n", " model output and ground truth values'''\n", " def calculate(self, output, y):\n", " sample_losses = self.forward(output, y)\n", " data_loss = np.average(sample_losses)\n", " return data_loss\n", "\n", "class Loss_CategoricalCrossEntropy(Loss):\n", " def forward(self, y_pred, y_true):\n", " '''y_pred is the neural network output\n", " y_true is the ideal output of the neural network'''\n", " samples = len(y_pred)\n", " # Bound the predicted values \n", " y_pred_clipped = np.clip(y_pred, 1e-7, 1-1e-7)\n", " \n", " if len(y_true.shape) == 1: # Categorically labeled\n", " correct_confidences = y_pred_clipped[range(samples), y_true]\n", " elif len(y_true.shape) == 2: # One hot encoded\n", " correct_confidences = np.sum(y_pred_clipped*y_true, axis=1)\n", "\n", " # Calculate the losses\n", " negative_log_likelihoods = -np.log(correct_confidences)\n", " return negative_log_likelihoods" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Backpropagation of a Single Neuron\n", "Backpropagation helps us find the gradient of the neural network with respect to each of the parameters (weights and biases) of each neuron.\n", "\n", "Imagine a layer that has 3 inputs and 1 neuron. There are 3 inputs (x0, x1, x2), three weights (w0, w1, w2), 1 bias (b0), and 1 output (z). There is a ReLU activation layer after the neuron output going into a square loss function (loss = z^2).\n", "\n", "Loss = (ReLU(sum(mul(x0, w0), mul(x1, w1), mul(x2, w2(, b0)))))^2\n", "\n", "$\\frac{\\delta Loss()}{\\delta w0} = \\frac{\\delta Loss()}{\\delta ReLU()} * \\frac{\\delta ReLU()}{\\delta sum()} * \\frac{\\delta sum()}{\\delta mul(x0, w0)} * \\frac{\\delta mul(x0, w0)}{\\delta w0}$\n", "\n", "$\\frac{\\delta Loss()}{\\delta ReLU()} = 2 * ReLU(sum(...))$\n", "\n", "$\\frac{\\delta ReLU()}{\\delta sum()}$ = 0 if sum(...) is less than 0 and 1 if sum(...) is greater than 0\n", "\n", "$\\frac{\\delta sum()}{\\delta mul(x0, w0)} = 1$\n", "\n", "$\\frac{\\delta mul(x0, w0)}{\\delta w0} = x0$\n", "\n", "This is repeated for w0, w1, w2, b0.\n", "\n", "We then use numerical differentiation to approximate the gradient. Then, we update the parameters using small step sizes, such that $w0[i+1] = w0[i] - step*\\frac{\\delta Loss()}{\\delta w0}$\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iteration 1, Loss: 36.0\n", "Iteration 2, Loss: 33.872397424621624\n", "Iteration 3, Loss: 31.87054345809546\n", "Iteration 4, Loss: 29.98699091998773\n", "Iteration 5, Loss: 28.214761511794592\n", "Iteration 6, Loss: 26.54726775906168\n", "Iteration 7, Loss: 24.978326552541866\n", "Iteration 8, Loss: 23.5021050739742\n", "Iteration 9, Loss: 22.11313179151597\n", "Iteration 10, Loss: 20.806246424284897\n", "Iteration 11, Loss: 19.576596334671486\n", "Iteration 12, Loss: 18.41961908608719\n", "Iteration 13, Loss: 17.33101994032309\n", "Iteration 14, Loss: 16.306757070164853\n", "Iteration 15, Loss: 15.343027506224132\n", "Iteration 16, Loss: 14.436253786815284\n", "Iteration 17, Loss: 13.583071280700132\n", "Iteration 18, Loss: 12.780312744165439\n", "Iteration 19, Loss: 12.024995767388878\n", "Iteration 20, Loss: 11.314319082257104\n", "Iteration 21, Loss: 10.64564263994962\n", "Iteration 22, Loss: 10.016485041642266\n", "Iteration 23, Loss: 9.424510031713222\n", "Iteration 24, Loss: 8.867521365009814\n", "Iteration 25, Loss: 8.34345204094211\n", "Iteration 26, Loss: 7.850353118483743\n", "Iteration 27, Loss: 7.386397874602818\n", "Iteration 28, Loss: 6.94986173712617\n", "Iteration 29, Loss: 6.539124434950737\n", "Iteration 30, Loss: 6.1526621719118015\n", "Iteration 31, Loss: 5.789039869058961\n", "Iteration 32, Loss: 5.446907999417336\n", "Iteration 33, Loss: 5.124995576577539\n", "Iteration 34, Loss: 4.822108497170647\n", "Iteration 35, Loss: 4.537121521071987\n", "Iteration 36, Loss: 4.268978030723312\n", "Iteration 37, Loss: 4.01668121563854\n", "Iteration 38, Loss: 3.7792956126389763\n", "Iteration 39, Loss: 3.5559389510643094\n", "Iteration 40, Loss: 3.345782865003274\n", "Iteration 41, Loss: 3.1480471758404285\n", "Iteration 42, Loss: 2.961997679823884\n", "Iteration 43, Loss: 2.78694359065541\n", "Iteration 44, Loss: 2.622235303237792\n", "Iteration 45, Loss: 2.467261121418954\n", "Iteration 46, Loss: 2.321446092335641\n", "Iteration 47, Loss: 2.184248486806066\n", "Iteration 48, Loss: 2.0551593804914616\n", "Iteration 49, Loss: 1.9336995852420789\n", "Iteration 50, Loss: 1.8194178573235094\n", "Iteration 51, Loss: 1.7118903069357754\n", "Iteration 52, Loss: 1.6107175940030252\n", "Iteration 53, Loss: 1.5155241897377694\n", "Iteration 54, Loss: 1.4259567411109748\n", "Iteration 55, Loss: 1.3416826255281136\n", "Iteration 56, Loss: 1.262389208248047\n", "Iteration 57, Loss: 1.1877819791340551\n", "Iteration 58, Loss: 1.1175840765571434\n", "Iteration 59, Loss: 1.0515348500680068\n", "Iteration 60, Loss: 0.9893891461492582\n", "Iteration 61, Loss: 0.930916260625565\n", "Iteration 62, Loss: 0.875899078709395\n", "Iteration 63, Loss: 0.8241334819517507\n", "Iteration 64, Loss: 0.7754271861095672\n", "Iteration 65, Loss: 0.7295994320679934\n", "Iteration 66, Loss: 0.6864801042040583\n", "Iteration 67, Loss: 0.6459091389617334\n", "Iteration 68, Loss: 0.6077358933180028\n", "Iteration 69, Loss: 0.5718187120029812\n", "Iteration 70, Loss: 0.5380242202642829\n", "Iteration 71, Loss: 0.5062269967452033\n", "Iteration 72, Loss: 0.4763089781884024\n", "Iteration 73, Loss: 0.4481591180173807\n", "Iteration 74, Loss: 0.42167291418136477\n", "Iteration 75, Loss: 0.3967520449790852\n", "Iteration 76, Loss: 0.3733039992368791\n", "Iteration 77, Loss: 0.3512417316144445\n", "Iteration 78, Loss: 0.33048334753976116\n", "Iteration 79, Loss: 0.31095177724411444\n", "Iteration 80, Loss: 0.2925745286179104\n", "Iteration 81, Loss: 0.2752833763568879\n", "Iteration 82, Loss: 0.25901412505149535\n", "Iteration 83, Loss: 0.2437063914735247\n", "Iteration 84, Loss: 0.22930333977371198\n", "Iteration 85, Loss: 0.21575151284725816\n", "Iteration 86, Loss: 0.2030006012946216\n", "Iteration 87, Loss: 0.19100326852350488\n", "Iteration 88, Loss: 0.17971497196649536\n", "Iteration 89, Loss: 0.1690938194815031\n", "Iteration 90, Loss: 0.1591003719214838\n", "Iteration 91, Loss: 0.14969754273736763\n", "Iteration 92, Loss: 0.14085041966208015\n", "Iteration 93, Loss: 0.13252615564761738\n", "Iteration 94, Loss: 0.1246938532452423\n", "Iteration 95, Loss: 0.11732446503349986\n", "Iteration 96, Loss: 0.11039058885430607\n", "Iteration 97, Loss: 0.10386649785129919\n", "Iteration 98, Loss: 0.09772798570124883\n", "Iteration 99, Loss: 0.09195226348280558\n", "Iteration 100, Loss: 0.0865178816583512\n", "Iteration 101, Loss: 0.08140467291758889\n", "Iteration 102, Loss: 0.07659366262828358\n", "Iteration 103, Loss: 0.07206697005843195\n", "Iteration 104, Loss: 0.06780781192053903\n", "Iteration 105, Loss: 0.06380037696069592\n", "Iteration 106, Loss: 0.06002977345222309\n", "Iteration 107, Loss: 0.0564820075507719\n", "Iteration 108, Loss: 0.05314393144118542\n", "Iteration 109, Loss: 0.050003114234231524\n", "Iteration 110, Loss: 0.04704793686603195\n", "Iteration 111, Loss: 0.04426740148833972\n", "Iteration 112, Loss: 0.04165120020443161\n", "Iteration 113, Loss: 0.03918961375201954\n", "Iteration 114, Loss: 0.0368735034129829\n", "Iteration 115, Loss: 0.034694277992582755\n", "Iteration 116, Loss: 0.032643851730490094\n", "Iteration 117, Loss: 0.03071459534999028\n", "Iteration 118, Loss: 0.028899363239415818\n", "Iteration 119, Loss: 0.027191414181739672\n", "Iteration 120, Loss: 0.02558439994540113\n", "Iteration 121, Loss: 0.024072362337913877\n", "Iteration 122, Loss: 0.022649683089386127\n", "Iteration 123, Loss: 0.021311092099735786\n", "Iteration 124, Loss: 0.02005160424149179\n", "Iteration 125, Loss: 0.01886655505507656\n", "Iteration 126, Loss: 0.017751540667355833\n", "Iteration 127, Loss: 0.016702427744061103\n", "Iteration 128, Loss: 0.01571531497821091\n", "Iteration 129, Loss: 0.014786535770396103\n", "Iteration 130, Loss: 0.013912651762769943\n", "Iteration 131, Loss: 0.013090418519936803\n", "Iteration 132, Loss: 0.012316768931710837\n", "Iteration 133, Loss: 0.011588849600126475\n", "Iteration 134, Loss: 0.010903943586632107\n", "Iteration 135, Loss: 0.010259526183227799\n", "Iteration 136, Loss: 0.009653186757193668\n", "Iteration 137, Loss: 0.009082688171817357\n", "Iteration 138, Loss: 0.008545899068542421\n", "Iteration 139, Loss: 0.00804083320361364\n", "Iteration 140, Loss: 0.007565618804557518\n", "Iteration 141, Loss: 0.007118492429622391\n", "Iteration 142, Loss: 0.006697793120481266\n", "Iteration 143, Loss: 0.0063019473730584336\n", "Iteration 144, Loss: 0.005929501997799936\n", "Iteration 145, Loss: 0.005579070290327091\n", "Iteration 146, Loss: 0.005249347396309216\n", "Iteration 147, Loss: 0.004939114136252681\n", "Iteration 148, Loss: 0.004647215154254898\n", "Iteration 149, Loss: 0.00437256400626425\n", "Iteration 150, Loss: 0.004114139259196158\n", "Iteration 151, Loss: 0.0038709956233987848\n", "Iteration 152, Loss: 0.0036422222163822442\n", "Iteration 153, Loss: 0.0034269635873455254\n", "Iteration 154, Loss: 0.0032244300300798123\n", "Iteration 155, Loss: 0.003033866206344064\n", "Iteration 156, Loss: 0.0028545694817259646\n", "Iteration 157, Loss: 0.0026858615040063873\n", "Iteration 158, Loss: 0.002527124440860861\n", "Iteration 159, Loss: 0.002377772426750458\n", "Iteration 160, Loss: 0.0022372501846465924\n", "Iteration 161, Loss: 0.002105026221950533\n", "Iteration 162, Loss: 0.0019806188966821317\n", "Iteration 163, Loss: 0.001863566163059441\n", "Iteration 164, Loss: 0.0017534302886055876\n", "Iteration 165, Loss: 0.0016498016244949178\n", "Iteration 166, Loss: 0.0015522968336895225\n", "Iteration 167, Loss: 0.0014605572212372654\n", "Iteration 168, Loss: 0.0013742383231737623\n", "Iteration 169, Loss: 0.0012930183418168389\n", "Iteration 170, Loss: 0.0012166008279945002\n", "Iteration 171, Loss: 0.0011447005613673634\n", "Iteration 172, Loss: 0.0010770513341135804\n", "Iteration 173, Loss: 0.001013397095948145\n", "Iteration 174, Loss: 0.0009535029620325111\n", "Iteration 175, Loss: 0.0008971534673183893\n", "Iteration 176, Loss: 0.0008441301639000644\n", "Iteration 177, Loss: 0.0007942435095401501\n", "Iteration 178, Loss: 0.0007473036766382048\n", "Iteration 179, Loss: 0.0007031374518087182\n", "Iteration 180, Loss: 0.0006615806720993984\n", "Iteration 181, Loss: 0.0006224808039162045\n", "Iteration 182, Loss: 0.0005856932236775429\n", "Iteration 183, Loss: 0.0005510780772974099\n", "Iteration 184, Loss: 0.0005185112321657664\n", "Iteration 185, Loss: 0.00048786689510026934\n", "Iteration 186, Loss: 0.00045903387854597503\n", "Iteration 187, Loss: 0.00043190420223823955\n", "Iteration 188, Loss: 0.000406378034681195\n", "Iteration 189, Loss: 0.00038236074013664776\n", "Iteration 190, Loss: 0.0003597649139507893\n", "Iteration 191, Loss: 0.0003385032407062897\n", "Iteration 192, Loss: 0.00031849748027454767\n", "Iteration 193, Loss: 0.00029967346881992795\n", "Iteration 194, Loss: 0.0002819629431575354\n", "Iteration 195, Loss: 0.0002652991815966534\n", "Iteration 196, Loss: 0.00024961903501571355\n", "Iteration 197, Loss: 0.00023486641976601822\n", "Iteration 198, Loss: 0.00022098629075865584\n", "Iteration 199, Loss: 0.00020792651372860275\n", "Iteration 200, Loss: 0.00019563773612380077\n", "Final weights: [-3.3990955 -0.20180899 0.80271349]\n", "Final bias: 0.6009044964517248\n" ] } ], "source": [ "import numpy as np\n", "\n", "# Initial parameters\n", "weights = np.array([-3.0, -1.0, 2.0])\n", "bias = 1.0\n", "inputs = np.array([1.0, -2.0, 3.0])\n", "target_output = 0.0\n", "learning_rate = 0.001\n", "\n", "def relu(x):\n", " return np.maximum(0, x)\n", "\n", "def relu_derivative(x):\n", " return np.where(x > 0, 1.0, 0.0)\n", "\n", "for iteration in range(200):\n", " # Forward pass\n", " linear_output = np.dot(weights, inputs) + bias\n", " output = relu(linear_output)\n", " loss = (output - target_output) ** 2\n", "\n", " # Backward pass to calculate gradient\n", " dloss_doutput = 2 * (output - target_output)\n", " doutput_dlinear = relu_derivative(linear_output)\n", " dlinear_dweights = inputs\n", " dlinear_dbias = 1.0\n", "\n", " dloss_dlinear = dloss_doutput * doutput_dlinear\n", " dloss_dweights = dloss_dlinear * dlinear_dweights\n", " dloss_dbias = dloss_dlinear * dlinear_dbias\n", "\n", " # Update weights and bias\n", " weights -= learning_rate * dloss_dweights\n", " bias -= learning_rate * dloss_dbias\n", "\n", " # Print the loss for this iteration\n", " print(f\"Iteration {iteration + 1}, Loss: {loss}\")\n", "\n", "print(\"Final weights:\", weights)\n", "print(\"Final bias:\", bias)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Backpropagation of a Layer" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iteration 0, Loss: 466.56000000000006\n", "Iteration 20, Loss: 5.32959636083938\n", "Iteration 40, Loss: 0.41191523404899866\n", "Iteration 60, Loss: 0.031836212079467595\n", "Iteration 80, Loss: 0.002460565465389601\n", "Iteration 100, Loss: 0.000190172825660145\n", "Iteration 120, Loss: 1.4698126966451542e-05\n", "Iteration 140, Loss: 1.1359926717815175e-06\n", "Iteration 160, Loss: 8.779889800154524e-08\n", "Iteration 180, Loss: 6.7858241357822796e-09\n", "Final weights:\n", " [[-0.00698895 -0.01397789 -0.02096684 -0.02795579]\n", " [ 0.25975286 0.11950572 -0.02074143 -0.16098857]\n", " [ 0.53548461 0.27096922 0.00645383 -0.25806156]]\n", "Final biases:\n", " [-0.00698895 -0.04024714 -0.06451539]\n" ] } ], "source": [ "import numpy as np\n", "\n", "# Initial inputs\n", "inputs = np.array([1, 2, 3, 4])\n", "\n", "# Initial weights and biases\n", "weights = np.array([\n", " [0.1, 0.2, 0.3, 0.4],\n", " [0.5, 0.6, 0.7, 0.8],\n", " [0.9, 1.0, 1.1, 1.2]\n", "])\n", "\n", "biases = np.array([0.1, 0.2, 0.3])\n", "\n", "learning_rate = 0.001\n", "\n", "# Add the derivative function to the ReLU class\n", "class Activation_ReLU:\n", " def forward(self, inputs):\n", " return np.maximum(0, inputs)\n", " \n", " def derivative(self, inputs):\n", " return np.where(inputs > 0, 1, 0)\n", " \n", "relu = Activation_ReLU()\n", "\n", "num_iterations = 200\n", "\n", "# Training loop\n", "# A single layer of 3 neurons, each with 4 inputs\n", "# The neuron layer is then fed into a ReLU activation layer\n", "for iteration in range(num_iterations):\n", " # Forward pass\n", " neuron_outputs = np.dot(weights, inputs) + biases\n", " relu_outputs = relu.forward(neuron_outputs)\n", " \n", " # Calculate the squared loss assuming the desired output is a sum of 0. Trivial but just an example\n", " final_output = np.sum(relu_outputs)\n", " loss = final_output**2\n", "\n", " # Backward pass\n", " dL_dfinal_output = 2 * final_output\n", " dfinal_output_drelu_output = np.ones_like(relu_outputs)\n", " drelu_output_dneuron_output = relu.derivative(neuron_outputs)\n", "\n", " dL_dneuron_output = dL_dfinal_output * dfinal_output_drelu_output * drelu_output_dneuron_output\n", "\n", " # Get the gradient of the Loss with respect to the weights and biases\n", " # dL_dW = np.outer(dL_dneuron_output, inputs)\n", " dL_dW = inputs.reshape(-1, 1) @ dL_dneuron_output.reshape(1, -1)\n", " dL_db = dL_dneuron_output\n", "\n", " # Update the weights and biases\n", " # Remove the .T if using dL_dW = np.outer(dL_dneuron_output, inputs)\n", " weights -= learning_rate * dL_dW.T\n", " biases -= learning_rate * dL_db\n", "\n", " # Print the loss every 20 iterations\n", " if iteration % 20 == 0:\n", " print(f\"Iteration {iteration}, Loss: {loss}\")\n", "\n", "# Final weights and biases\n", "print(\"Final weights:\\n\", weights)\n", "print(\"Final biases:\\n\", biases)\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 2 }