Deep neural networks: what they are and how they work

Deep neural networks are a concept that constitutes the main technological architecture used in Deep Learning models. These structures cannot be understood without understanding the general idea of artificial neural networks, fundamental to Artificial Intelligence.

Neural networks are used for a thousand things: recognizing license plates, songs, faces, voices, or even the fruits in our kitchen. They are a particularly useful technology and, despite the fact that they have only recently become practical, they are going to be the future of humanity.

Next We are going to see in depth the idea of artificial neural networks and deep, understanding how they work, how they are trained and how the interactions between the different neurons that constitute them occur.

Related article: "What is cognitive science? Its basic ideas and phases of development"

What are deep neural networks and what characterizes them?

Deep neural networks are one of the most important technological architectures used in Deep Learning or Deep Learning

instagram story viewer

. These particular artificial networks have had a dizzying growth in recent years because they constitute a fundamental aspect when it comes to recognizing all kinds of patterns. Artificial Intelligence exists thanks to the operation of these particular networks that, in In essence, they come to be a replica of how our brains work, although in a technological and math.

Before going further into what deep neural networks are, we first need to understand how artificial neural networks work in general and what they are for. Lneural networks are a branch of “Machine Learning” that have had a huge impact in recent years, helping programmers and computer scientists build things like chatbots that, when we talk to them, make us think we're talking to real human beings.

Artificial neural networks have also been used with self-driving cars, mobile applications that recognize our face and transform it into what we want and many more functions. Its applicability is very extensive, serving as the basis of modern Artificial Intelligence and having endless beneficial uses for our day to day.

artificial neural networks

Let's imagine that we are in our kitchen and we decide to look for an orange, a very simple task.. We know how to identify an orange very easily and we also know how to differentiate it from other fruits that we find in the kitchen, such as bananas, apples and pears. As? Because in our brain we have very assimilated what are the typical properties of an orange: its its size, its shape, the color it has, what it smells like... These are all parameters that we use to find a orange.

It's a simple task for humans, but... Can a computer do it too? The answer is yes. In principle, it would be enough to define those same parameters and assign a value to a node or something that we could well call an “artificial neuron”. We would tell that neuron what oranges are like, indicating their size, weight, shape, color or any other parameter that we attribute to this fruit. Having this information, it is expected that the neuron will know how to identify an orange when presented with one.

If we have chosen the parameters well, it will be easy for you to differentiate between oranges and things that are not oranges simply by taking those characteristics into account. When presented with an image of any fruit, that neuron will search for the characteristics associated with the orange and decide whether to include it in the category “orange” or in the category “other fruit". In statistical terms, it would be to find a region in a parameter graph that corresponds to what is being looking for a region that would encompass all the pieces of fruit that share the same size, shape, color, weight and aroma that the oranges.

At first this all sounds very easy to code, and in fact it is. It works very well to differentiate an orange from a banana or an apple, since they have different colors and shapes. However, what if we presented you with a grapefruit? and a very large tangerine? They are fruits that can perfectly be confused with an orange. Will the artificial neuron be able to differentiate by itself between oranges and grapefruits? The answer is no, and in fact they are probably thought to be the same.

The problem with using only one layer of artificial neurons, or what is the same, only using simple neurons first, is that generate very imprecise decision boundaries when you are presented with something that has many characteristics in common with what you should be able to recognize, but in reality it is not. If we present something that looks like an orange, such as a grapefruit, even if it is not that fruit, it will identify it as such.

These decision borders, if they are represented in the form of a graph, will always be linear. Using a single artificial neuron, that is, a single node that has integrated parameters concrete, but cannot learn beyond them, very close decision boundaries will be obtained. diffuse. Its main limitation is that it uses two statistical methods, specifically multiclass regression and logistic regression, which means that when in doubt it includes something that is not what we expected it to be. will identify.

If we were to divide all fruits into "oranges" and "not oranges", using only one neuron it is clear that bananas, pears, apples, watermelons and any fruit that does not correspond in size, color, shape, aroma and so on with oranges I would put them in the "no" category. oranges”. However, grapefruits and tangerines would put them in the "oranges" category, doing the job for which they have been designed poorly.

And when we talk about oranges and grapefruits we could well talk about dogs and wolves, hens and chickens, books and notebooks... All These situations are cases in which a simple series of "ifs..." ("if...") would not suffice to clearly distinguish between one and the other. other. A more complex, non-linear system is needed, which is more precise when it comes to differentiating between different elements. Something that takes into account that between the similarities there may be differences. This is where neural networks come in.

More layers, more similar to the human brain

Artificial neural networks, as their name suggests, are computational artificial models inspired by in the neural networks of the human brain, networks that in fact mimic the functioning of this organ biological. This system is inspired by neural functioning and its main application is the recognition of patterns of all kinds: facial identification, voice recognition, fingerprint, handwriting, license plates… Pattern recognition works for almost everything..

As there are different neurons, the parameters that are applied are various and a higher degree of precision is obtained. These neural networks are systems that allow us to separate items into categories when the difference can be subtle, separating them in a non-linear way, something that would be impossible to do otherwise manner.

With a single node, with a single neuron, what is done when handling the information is a multiclass regression. By adding more neurons, as each one of them has its own non-linear activation function which, translated into simpler language, makes them have decision borders that are more precise, being graphically represented in a curved shape and taking into account more characteristics when differentiating between "oranges" and "not oranges", to continue with that example.

The curvature of these decision borders will directly depend on how many layers of neurons we add to our neural network. Those layers of neurons, which make the system more complex and more precise, are, in effect, deep neural networks. In principle, the more layers of deep neural networks we have, the more accurate and similar the program will be compared to the human brain.

In short, neural networks are nothing more than an intelligent system that allows more precise decisions to be made, in a very similar way to how we human beings do it. Human beings are based on experience, learning from our environment. For example, going back to the case of orange and grapefruit, if we have never seen one, we will perfectly mistake it for an orange. When we have become familiar with it, it will be then when we already know how to identify it and differentiate it from oranges.

The first thing that is done is to give some parameters to the neural networks so that they know what it is like that we want it to learn to identify. Then comes the learning or training phase, so that it is increasingly accurate and progressively has a smaller margin of error. This is the time when we would present our neural network with an orange and other fruits. In the training phase they will be given cases in which they are orange and cases in which they are not orange, looking to see if they got their answer right and telling them the correct answer.

We will try to make numerous attempts and as close as possible to reality.. In this way we are helping the neural network to operate for when real cases arrive and it knows how to discriminate properly, in the same way that a human being would do in real life. If the training has been adequate, having chosen good recognition parameters and have classified well, the neural network is going to have a very high pattern recognition success rate. high.

You may be interested in: "How do neurons work?"

What are they and how do they work exactly?

Now that we have seen the general idea of what neural networks are and we are going to understand more fully what they are and how these emulators of the neurons of the human brain work and where do deep neural networks paint in all this process.

Let's imagine that we have the following neural network: we have three layers of artificial neurons. Let's say that the first layer has 4 neurons or nodes, the second 3, and the last one has only 2. This is all an example of an artificial neural network, quite easy to understand.

The first layer is the one that receives the data., that is, the information that may well come in the form of sound, image, aromas, electrical impulses... This first layer is the input layer, and is in charge of receiving all the data to be able to later send it to the following layers. During the training of our neural network, this will be the layer with which we are going to work first, giving it data that we will use to see how well you are at making predictions or identifying the information you are given gives.

The second layer of our hypothetical model is the hidden layer, which sits right in the middle of the first and last layers., as if our neural network were a sandwich. In this example we only have one hidden layer, but there could be as many as we want. We could talk about 50, 100, 1000 or even 50,000 layers. In essence, these hidden layers are the part of the neural network that we would call the deep neural network. The greater the depth, the more complex the neural network.

Finally we have the third layer of our example which is the output layer. This layer, as its name indicates, is in charge of receiving information from the previous layers, making a decision and giving us an answer or result.

In the neural network each artificial neuron is connected to all the following ones. In our example, where we have commented that we have three layers of 4, 3 and 2 neurons, the 4 of the input layer are connected with the 3 of the hidden layer, and the 3 of the hidden layer with the 2 of the output, giving us a total of 18 connections.

All these neurons are connected with those of the next layer, sending the information in the input->hidden->output direction.. If there were more hidden layers, we would talk about a greater number of connections, sending the information from hidden layer to hidden layer until it reaches the output layer. The output layer, once it has received the information, what it will do is give us a result based on the information it has received and its way of processing it.

When we are training our algorithm, that is, our neural network, this process that we have just explained is going to be done many times. We are going to deliver some data to the network, we are going to see what the result gives us and we are going to analyze it and compare it with what we expected the result to give us. If there is a large difference between what is expected and what is obtained, it means that there is a high margin of error and that, therefore, it is necessary to make a few modifications.

How do artificial neurons work?

Now we are going to understand the individual functioning of the neurons that work within a neural network. The neuron receives an input of information from the previous neuron. Let's say that this neuron receives three information inputs, each one coming from the three neurons of the previous layer. In turn, this neuron generates outputs, in this case let's say that it is only connected to a neuron of the next layer.

Each connection that this neuron has with the three neurons of the previous layer brings an "x" value, which is the value that the previous neuron is sending us.; and it also has a value "w", which is the weight of this connection. Weight is a value which helps us give more importance to one connection over others. In short, each connection with the previous neurons has an “x” and a “w” value, which are multiplied (x·w).

We are also going to have a value called "bias" or bias represented with "b" which is the number of error which encourages certain neurons to activate more easily than others. In addition, we have an activation function within the neuron, which is what makes its degree of classification of different elements (p. g., oranges) is not linear. On its own, each neuron has different parameters to take into account, which makes the entire system, this is the neural network, classify in a non-linear way.

How does the neuron know if it has to activate or not? that is, when do you know if you have to send information to the next layer? Well, this decision is governed by the following equation:

This formula comes to mean that the sum of all the weights "w" multiplied by all the values of "x" that the neuron is receiving from the previous layer has to be made. Added to this, the bias "b" is added.

The result of this equation is sent to an activation function, which is simply a function that tells us that, if the result of this equation is greater than a certain number, the neuron will send a signal to the next layer and, if it is less, then it will not to send it So, this is how an artificial neuron decides whether or not to send information to the neurons as follows: layer by means of an output that we will call "y", an output that, in turn, is the input "x" of the following neuron.

And how do you train an entire network?

The first thing that is done is to deliver data to the first layer, as we have previously commented. This layer will send information to the following layers, which are the hidden layers or the deep neural network. The neurons of these layers will activate or not depending on the information received. Finally, the output layer will give us a result, which we will compare with the value we were waiting for to see if the neural network has learned what to do correctly.

If he did not learn well then we will perform another interaction, that is, we will present you with information again and see how the neural network behaves. Depending on the results obtained, the "b" values will be adjusted, that is, the bias of each neuron, and the "w", this is the weight of each connection with each neuron to reduce the error. To find out how big that error is, we are going to use another equation, which is the following:

This equation is the root mean square error. We are going to do the sum of y (x) which is the value that our network gave us in the interaction minus “a”, which is the value that we were expecting it to give us, raised to the square. Finally, we are going to multiply this sum by 1/2n, being that “n” the number of interactions that we have sent to train our neural network.

For example, suppose we have the following values

The first column “y (x)” represents what our network has given us in each of the four interactions that we have tested it. The values that we have obtained, as can be seen, do not correspond to those of the second column “a”, which are the desired values for each of the tested interactions. The last column represents the error of each interaction.

Applying the aforementioned formula and using these data here, keeping in mind that in this case n = 4 (4 interactions) gives us a value of 3.87, which is the mean square error that our neural network has in these moments. Knowing the error, what we have to do now is, as we have commented before, change the bias and the weights of each one of the neurons and their interactions with the intention that in this way the error is reduce.

At this point, engineers and computer scientists apply an algorithm called gradient descent with which they can obtain values to test and modify the bias and weight of each artificial neuron so that, in this way, an increasingly low error is obtained, approaching the prediction or result wanted. It is a matter of testing and the more interactions are made, the more training there will be and the more the network will learn.

Once the neural network is adequately trained, it will be when it will give us accurate and reliable predictions and identifications. At this point we are going to have a network that will have in each of its neurons a value of defined weight, with a controlled bias and with a decision capacity that will make the system work.

Bibliographic references:

Puig, A. [AMP Tech] (2017, July 28). How do neural networks work? [Video file]. Recovered from https://www.youtube.com/watch? v=IQMoglp-fBk&ab_channel=AMPTech
Santaolalla, J. [Give yourself a Vlog] (2017, April 11) CienciaClip Challenge - What are neural networks? [Video file]. https://www.youtube.com/watch? v=rTpr6DuY4LU&ab_channel=DateunVlog
Schmidhuber, J. (2015). "Deep Learning in Neural Networks: An Overview". Neural Networks. 61: 85–117. arXiv: 1404.7828. doi: 10.1016/j.neunet.2014.09.003. PMID 25462637. S2CID 11715509