Standard deviation: what is it and what is this measure for?
The term standard deviation or standard deviation refers to a measure that is used to quantify the variation or dispersion of numerical data. in a random variable, statistical population, data set, or probability distribution.
The world of research and statistics may seem complex and foreign to the general population, as it seems that mathematical calculations happen under our eyes without our being able to understand the underlying mechanisms of the themselves. Nothing is further from reality.
On this occasion we are going to relate in a simple but exhaustive way the context, the foundation and application of a term as essential as the standard deviation in the field of statistics.
- Related article: "Psychology and Statistics: The Importance of Probabilities in the Science of Behavior"
What is the standard deviation?
Statistics is a branch of mathematics that is responsible for recording variability, as well as the random process that generates it. following the laws of probability. This is said soon, but within the statistical processes are the answers to everything that today we consider as "dogmas" in the world of nature and physics.
For example, let's say that when tossing a coin three times, two of them come up heads and tails. Simple coincidence, right? On the other hand, if we flip the same coin 700 times and 660 of them land on heads, perhaps it is possible that there is a factor that favors this phenomenon beyond randomness (let us imagine, for example, that it only has time to do a limited number of turns in the air, which means that it almost always falls in the same mode). Thus, observing patterns beyond mere coincidence prompts us to think about the underlying reasons for the trend.
What we want to demonstrate with this bizarre example is that Statistics is an essential tool for any scientific process., because based on it we are able to distinguish realities that are the result of chance from events governed by natural laws.
Thus, we can throw a hasty definition of the standard deviation and say that it is a statistical measure that is the product of the square root of its variance. This is like starting the house from the roof, because for a person who is not entirely dedicated to the world of numbers, this definition and not knowing anything about the term are little different. So let's take a moment to dissect the world of basic statistical patterns..
Measures of position and variability
Position measures are indicators used to indicate what percentage of data within a frequency distribution exceed these expressions, whose value represents the value of the data that is in the center of the frequency distribution. Do not despair, because we define them quickly:
- Mean: The numerical average of the sample.
- Median: represents the value of the central position variable in a set of ordered data.
In a rudimentary way, we could say that position measures are focused on dividing the data set into equal percentage parts, that is, “getting to the middle”.
On the other hand, measures of variability are responsible for determine the degree of closeness or distance of the values of a distribution compared to its average location (ie, versus the mean). These are the following:
- Range: Measures the width of the data, that is, from the minimum to the maximum value.
- Variance: the expectation (mean of the data series) of the square of the deviation of said variable with respect to its mean.
- Standard deviation: numerical index of the dispersion of the data set.
Of course, we are moving in relatively complex terms for someone who is not fully dedicated to the world of mathematics. We do not want to go into other measures of variability, since knowing that the greater the numerical products of these parameters, the less homogenized the data set will be.
- You may be interested in: "Psychometry: what is it and what is it responsible for?"
“Mean of the Atypical”
Once we have cemented the knowledge of the measures of variability and their importance in data analysis, it is time to refocus our attention on the standard deviation.
Without going into complex concepts (and perhaps committing the sin of oversimplifying things), we can say that this measure is the product of calculating the mean of the “outlier” values. Let's give an example to clarify this definition:
We have a sample of six pregnant bitches of the same breed and age that have just given birth to their litters of puppies simultaneously. Three of them have given birth to 2 puppies each, while another three have given birth to 4 puppies per female. Naturally, the mean value of offspring is 3 pups per female (the sum of all pups divided by the total number of females).
What would the standard deviation be in this example? First of all, we would have to subtract the mean from the values obtained and raise this figure to the square (since we do not want negative numbers), for example: 4-3=1 or 2-3= (-1, raised to the square, 1) .
The variance would be calculated as the mean of the deviations from the mean value (in this case, 3). Here we would be facing the variance, and therefore, we have to take the square root of this value to transform it into the same numerical scale as the mean. After this we would obtain the standard deviation.
So what would the standard deviation of our example be? Well, a puppy. It is estimated that the average for litters is three offspring, but it is normal for the mother to give birth to one less or one more pup per litter.
Perhaps this example might sound a bit confusing as far as variance and deviation are concerned (since the square root of 1 is 1), but if the variance were 4, the result of the standard deviation would be 2 (remember, its root square).
What we wanted to demonstrate with this example is that variance and standard deviation are statistical measures that seek to obtain the mean of values other than the mean. Remember: the greater the standard deviation, the greater the dispersion of the population.
Going back to the previous example, if all the bitches are of the same breed and have similar weights, it is normal for the deviation to be one pup per litter. But for example, if we take a mouse and an elephant, it is clear that the deviation in terms of the number of offspring would reach values much greater than one. Again, the less the two sample groups have in common, the greater the deviations can be expected.
Even so, one thing is clear: using this parameter we are calculating the variance in the data of a sample, but this does not have to be representative of an entire population. In this example we have caught six bitches, but what if we monitored seven and the seventh had a litter of 9 puppies?
Of course, the pattern of deviation would change. For this reason, take into account sample size is essential when interpreting any data set. The more individual numbers are collected and the more times an experiment is repeated, the closer we come to postulating a general truth.
conclusions
As we have been able to observe, the standard deviation is a measure of data dispersion. The greater the dispersion, the greater this value will be., because if we were faced with a set of completely homogeneous results (that is, that they were all equal to the mean), this parameter would be equal to 0.
This value is of enormous importance in statistics, since not everything is reduced to finding common bridges between figures and events, but rather it is also essential to record the variability between sample groups in order to ask ourselves more questions and obtain more knowledge in the long run. term.
Bibliographic references:
- Calculate the standard deviation step by step, khanacademy.org. Collected on August 29 in https://es.khanacademy.org/math/probability/data-distributions-a1/summarizing-spread-distributions/a/calculating-standard-deviation-step-by-step
- Jaime, S., & Vinicio, M. (1973). Probability and statistics.
- Parra, J. m. (1995). Descriptive and inferential statistics I. Recovered from: http://www. academy. edu/download/35987432/ESTADISTICA_DESCRIPTIVA_E_INFERENCIAL. pdf.
- Rendón-Macías, M. E., Villasis-Keeve, M. Á., & Miranda-Novales, M. g. (2016). Descriptive statistics. Allergy Magazine Mexico, 63(4), 397-407.
- Ricardo, F. Q. (2011). Statistics applied to health research. Obtained from the Chi-Square test: http://www. medwave. cl/link. cgi/Medwave/Series/MBE04/5266.