#006 PyTorch Solving the famous XOR problem using Linear classifiers with PyTorch

It is often unclear what a single neuron is doing, whereas it is obvious what a single logical gate is doing. However, the inner workings of a neural network need not be so mysterious, and in some cases the neural network structure can be simple enough to grasp fully and design. On the contrary, the function drawn to the right of the ReLU function is linear. Applying multiple linear activation functions will still make the network linear.


Long Short-Term Memory (LSTM) is a special type of recurrent neural network that has memory cells that can store information over long periods of time. LSTMs are designed to solve problems where there are long-term dependencies in data. In fact, single-layer feedforward networks cannot solve problems that require non-linear decision boundaries like in the case of XOR problem. This is because they lack the ability to capture non-linear relationships between input variables. For the XOR problem, we can use a network with two input neurons, two hidden neurons, and one output neuron.

Going Beyond Large Language Models (LLMs)

However, it doesn’t ever touch 0 or 1, which is important to remember. The basic idea is to take the input, multiply it by the synaptic weight, and check if the output is correct. If it is not, adjust the weight, multiply it by the input again, check the output and repeat, until we have reached an ideal synaptic weight.

BroutonLab is recognized by Corporate Vision as the Best Data Research & Consulting Company in 2021

Now we could imagine coding up a logical network for this data by hand. We’d use our business rules to define who we think is and isn’t at risk from the flu. We might come up with something like the following, where a prediction of 1 indicates the patient is a risk and a prediction 0 means they are not at risk. And, in my case, in iteration number 107 the accuracy rate increases to 75%, 3 out of 4, and in iteration number 169 it produces almost 100% correct results and it keeps like that ‘till the end. As it starts with random weights the iterations in your computer would probably be slightly different but at the end, you’ll achieve the binary precision, which is 0 or 1. Here we define the loss type we’ll use, the weight optimizer for the neuron’s connections, and the metrics we need.

Based on this comparison, the weights for both the hidden layers and the output layers are changed using backpropagation. Backpropagation is done using the Gradient Descent algorithm. The XOR, or “exclusive or”, problem is a classic problem in ANN research. It is the problem of using a neural network to predict the outputs of XOR logic gates given two binary inputs.

  1. Learning by perceptron in a 2-D space is shown in image 2.
  2. The next post in this series will feature a implementation of the MLP architecture described here, including all of the components necessary to train the network to act as an XOR logic gate.
  3. In practice, we use very large data sets and then defining batch size becomes important to apply stochastic gradient descent[sgd].
  4. The table on the right below displays the output of the 4 inputs taken as the input.

Our algorithm —regardless of how it works — must correctly output the XOR value for each of the 4 points. We’ll be modelling this as a classification problem, so Class 1 would represent an XOR value of 1, while Class 0 would represent a value of 0. For the XOR problem, 100% of possible data examples are available to use in the training process.

The most important thing to remember from this example is the points didn’t move the same way (some of them did not move at all). That effect is what we call “non linear” and that’s very important to neural networks. Some paragraphs above I explained why applying linear functions several times would get us nowhere. Visually what’s happening is the matrix multiplications are moving everybody sorta the same way (you can find more about it here). “Activation Function” is a function that generates an output to the neuron, based on its inputs. Although there are several activation functions, I’ll focus on only one to explain what they do.

Orange data points have a value of -1 and blue points have a value of +1. I’ll refer to the coordinates along the x-axis as the variable x1, and the coordinates along the y-axis as variable x2. Lets say you have a data set which is categorical variables only. It could be medical records where each patient has a category of sex (male, female), age bracket (0–5, 5–10, 10–15…45–50…), white blood cell count (high, medium, low), etc. Your task might be to use this information to predict which patients are at risk of catching the flu this year, and therefore should receive a flu shot.

If you don’t remember them or just don’t know what’s that we’ll show you.We have two binary entries ( 0 or 1) and the output will be 1 only when just one of the entries is 1 and the other is 0. It means that from the four possible combinations only two will have 1 as output. Though this is a simple concept, a beginner will find it as an interesting xor neural network start of mathematical relation to the multilayer perceptron. It is overall very similar to the network we modeled above, but its learning capabilities will be slightly increased. As, out example for this post is a rather simple problem, we don’t have to do much changes in our original model except going for LeakyReLU instead of ReLU function.

These algorithms are part of a larger category of machine learning techniques. One way to solve the XOR problem is by using feedforward neural networks. Feedforward neural networks https://forexhero.info/ are a type of artificial neural network where the information flows in one direction, from input to output. For example, we can take the second number of the data set.

This problem may seem easy to solve manually, but it poses a challenge for traditional neural networks because they lack the ability to capture non-linear relationships between input variables. The next step would be to create a data set because we cannot just train our data on these four points. So, we will create a function create_dataset() that will accept x1, x2 and y as our input parameters. Repeat() to repeat every number in x1, x2, and y 50 times.

Note that due to peculiarities of the Tensorflow Playground you should only add the first three layers. The output layer where the scaling happens is obscured, but if you build up the first three layers and then train the network for very short time you should get results like below. In our X-OR problem, output is either 0 or 1 for each input sample. We will use binary cross entropy along with sigmoid activation function at output layer.[Ref image 6].

Now, this value is fed to a neuron which has a non-linear function(sigmoid in our case) for scaling the output to a desirable range. The scaled output of sigmoid is 0 if the output is less than 0.5 and 1 if the output is greater than 0.5. Our main aim is to find the value of weights or the weight vector which will enable the system to act as a particular gate. A clear non-linear decision boundary is created here with our generalized neural network, or MLP.

In my next post, I will show how you can write a simple python program that uses the Perceptron Algorithm to automatically update the weights of these Logic gates. The next post in this series will feature a implementation of the MLP architecture described here, including all of the components necessary to train the network to act as an XOR logic gate. Using this full network we can test input values to predict an output value. If you are familiar with logical gates you might notice that an inner part of this expression this is an XNOR gate. In order to do this I would like to link you to a particular data set on the Tensorflow Playground. An image of a generated set of data from this distribution is below.

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *