5.5 Standard Deviation:
Introduction:
You must have read
newspapers comparing performances of two batsmen in cricket. What do they
compare? They say one is more consistent than the other and one is more stylish
than the other.
Stylishness is a quality
which cannot be compared using runs scored by batsmen. However, they compare
one being better than the other and their consistency in batting, based on the
runs scored in several innings.
Let us study how Statistics
can help us in this regard.
Standard deviation:
You must have heard people
talking about deviations (deviation in rules, deviation in works, deviation in
results, etc)? Deviation is always compared with respect to a standard.
Standard could be thought
of as an average (also called arithmetic mean).
5.5 Example 1: Let a batsman’s score in 6 innings be 48,50,54,46,48,54
Working:
Notations used:
X = Set of scores (48,50,54,46,48,54)
N = Number of scores (=6)
=The Arithmetic mean (AM) = ()/N
d = Deviation from the arithmetic
mean = X -
Step 1: Find the Arithmetic
mean of his scores (AM) = 50 =(48+50+54+46+48+54)/6
Step 2: Find d (= X-AM) and
d2 for each of the scores
Table of Calculation: (with
actual AM)
No |
Runs(X) |
Deviation (d) = X- |
(Deviation)2
= d2 |
1 |
48 |
-2 |
4 |
2 |
50 |
0 |
0 |
3 |
54 |
4 |
16 |
4 |
46 |
-4 |
16 |
5 |
48 |
-2 |
4 |
6 |
54 |
4 |
16 |
|
=300 |
=0 |
= 56 |
Step 3: Calculate Variance
as =/ N
Step 4: Calculate Standard
deviation (SD) as = =
SD is denoted by Greek
letter.
In the above
Definition: ‘Standard deviation’
is the square root of the arithmetic average of the squares of the deviations
from the mean.
Interpretation: In this
example we say that on an average, the batsman’s scores deviate from the
arithmetic mean (=50) by 3.05(3 ).
It can be
predicted that, more or less this batsman is likely to score 47-53 {(50-3)-(50+3)} runs in future matches.
Note:
If the batsman’s score were to be 48,100,50,10,2,80,
it would not have been possible to predict reasonably accurately . Since the batsman was consistent with
scores around 50, it was possible to predict.
General Procedure:
Let X = {x1, x2 , x3……….. xn}
be the scores
N = Number of scores
= Arithmetic mean (AM) = (x1+x2
+ x3+…… xn)/N= / N
Step 1: Calculate deviation
from AM, d (=X-) and d2 for
each of the scores
Step 2: Calculate Variance
= ()/ N
Step 3: Calculate Standard
deviation (SD)
SD = =
Alternate method of finding, when AM is not a whole number.
In the above example the
Arithmetic mean (=50) happened to be an integer and our computations became
easy. If arithmetic mean contains decimals, finding d2 will be tough
and in such cases we follow a different method.
To start with, we assume Arithmetic mean to be one of the scores itself.
Then we calculate d (= X-A where A is the assumed AM) and d2 for
each of the scores. Then actual AM and SD are derived
as follows:
Actual AM = Assumed AM + ()/N
SD () = [(d2)/N - ((d)/N)2]
Let us take the above
example and find SD using this alternate method.
Let us assume AM to be 54
(A = 54.) Here N = 6.
Table of Calculation (with
assumed AM)
No |
Runs(X) |
Deviation(D) d= X-A |
(Deviation)2
= d2 |
1 |
48 |
-6 |
36 |
2 |
50 |
-4 |
16 |
3 |
54 |
0 |
0 |
4 |
46 |
-8 |
64 |
5 |
48 |
-6 |
36 |
6 |
54 |
0 |
0 |
|
|
= -24 |
= 152 |
Actual AM = Assumed AM + ()/N= 54 + (-24/6) = 54-4 = 50
SD() = [(d2)/N - ((d)/N)2]
= [152/6 –(24/6)2] = (25.33-16) = (9.33) =3.05
You will notice that both
the methods give same SD in all cases.
When same scores repeat
many times in the data, we follow a slightly different method as listing
individual scores and calculating SD becomes tedious.
Standard
Deviation for grouped data:
Let the scores and
frequencies be
Scores(X) -------ŕ |
X1 |
X2 |
X3 |
…… |
Xn |
Frequency(f)
------ŕ |
f1 |
f2 |
f3 |
…….. |
fn |
N = Total number of
frequencies = f1 + f2 + f3 +…….. fn=
Step 1: Find f*x for each
of the scores
Step 2: Find the Arithmetic
mean = ()/N
Step 3: Find deviation for
each of the score d = (X-)
Step 4: Find the variance
of distribution = ((f*d2))/N
Step 5: Calculate SD() = [((f*d2))/N]
5.5 Example 2: Marks obtained in a test by 60
students are given below. Find AM and SD.
Scores(X) -------ŕ |
10 |
20 |
30 |
40 |
50 |
60 |
Frequency(f)
------ŕ |
8 |
12 |
20 |
10 |
7 |
3 |
Workings:
N (Total number of
frequencies) = = 8+12+20+10+7+3=60
Score(X) |
Frequency(f) |
fX |
Deviation= (X-) |
d2 |
f*d2 |
10 |
8 |
80 |
-20.83 |
433.89 |
3471.11 |
20 |
12 |
240 |
-10.83 |
117.29 |
1407.47 |
30 |
20 |
600 |
-.83 |
0.69 |
13.78 |
40 |
10 |
400 |
9.17 |
84.09 |
840.89 |
50 |
7 |
350 |
19.17 |
367.49 |
2572.42 |
60 |
3 |
180 |
29.17 |
850.89 |
2552.67 |
|
N==60 |
= 1850 |
|
|
(f*d2)=10858.33 |
Arithmetic Mean == ()/N= 1850/60 =30.83
Variance = (f*d2)/N = 10858.33/60= 180.97
SD () = [(f*d2)/N] = (180.97) =13.45
Interpretation: An average
mark of students is 30.83. The marks of students deviate from the Mean score by
about 13 marks.
In the above working you
must have observed that AM had decimals. Because of this reason d, d2
and f*d2 were all decimals and calculations were difficult.
In such cases we use an
alternate method which is easier to work with.
Alternate Method
Step 1: Assume any of the score as Average (A)
Step 2: Find the deviation
d, from the assumed average for every score (d=X-A).
Step 3: Find f*d, d2 ,f*d2 for each of the scores.
Step 4: Arrive at AM and SD
as given below.
Arithmetic Mean == A + /N, where N =
SD ()= [(f*d2)/N - ((f*d)/N)2
]
In the above example let us
assume 30 to be the
Average (A) and by following steps 1 to 3 we get
Score(X) |
Frequency(f) |
Deviation(d) =X-A |
f*d |
d2 |
f*d2 |
10 |
8 |
-20 |
-160 |
400 |
3200 |
20 |
12 |
-10 |
-120 |
100 |
1200 |
30 |
20 |
0 |
0 |
0 |
0 |
40 |
10 |
10 |
100 |
100 |
1000 |
50 |
7 |
20 |
140 |
400 |
2800 |
60 |
3 |
30 |
90 |
900 |
2700 |
|
N==60 |
|
=50 |
|
(f*d2)=10900 |
We note that AM = A+ ()/ (N) = 30+50/60 = 30+0.83= 30.83
SD () = [(f*d2)/N - ((f*d)/N)2]
= [(10900/60) – (50/60)2]
= (181.67 - 0.69) = (180.97) =13.45
The average mark of
students is 30.83. The marks of students deviate from the Mean score by about
13 marks.
Observe that
we got same results in both the methods.
We have seen earlier that
many times data is collected in class intervals and not as individual scores.
In such cases we need to calculate AM in a different ways and not as average of
scores.
How to find SD and
interpret results if we have grouped data?
Step 1: Find the
mid-points(x) for each of the class interval.
Step 2: Find the product
f*x for each of the class interval.
Step3: Calculate the
arithmetic mean == ( )/N, where N = .
Step 4: Find the Deviation
d from the arithmetic mean () for each the
class intervals. (d=X-)
Step 5: Find d2 and
f*d2 for each of the class interval.
Step 6: Calculate SD using
the formula SD () = [(f*d2)/N]
5.5 Example 3: Marks obtained in a test by
students are
Marks |
Frequency(f) |
Mid-point(x) |
f*x |
d=X-) |
d2 |
f*d2 |
25-30 |
5 |
28 |
140 |
-9.2 |
84.64 |
423.2 |
30-35 |
10 |
33 |
330 |
-4.2 |
17.64 |
176.4 |
35-40 |
25 |
38 |
950 |
0.8 |
0.64 |
16 |
40-45 |
8 |
43 |
344 |
5.8 |
33.64 |
269.12 |
45-50 |
2 |
48 |
96 |
10.8 |
116.64 |
233.28 |
|
N = = 50 |
|
=1860 |
|
|
(f*d2)=1118 |
Working:
Arithmetic mean== /N = 1860/50 = 37.2
SD () = [(f*d2)/N] = (1118/50) = (22.36) =4.728
Interpretation:
The average marks scored is 37.2. The marks of students deviate from the Mean
(average) score by about 5 marks.
In the above working you
must have observed that AM had decimals. Because of this reason d, d2
and f*d2 had decimals and calculation was difficult. In such cases
we use an alternate method which is easier to work with.
Alternate Method (Step – Deviation Method)
Step 1: Assume one of the middle values of the class
interval as the arithmetic mean (A).
Step 2: Find the
‘step-deviation’ (d) from the assumed mean d=(X-A)/i:
Where ‘i’ is the size of the class interval
Step 3: Find d2,
f*d and f*d2 for each of the class intervals
Step 4: Compute AM and SD as follows
AM = Arithmetic mean== A + [/N]*i
SD () = [(f*d2)/N - ((f*d)/N)2]*i
Let us workout the above
example using this method
In the above example let us
assume mean (A) to be 43. Note i = size of class
interval = 5.
By following steps 1 to 3
we have:
Marks |
Frequency(f) |
Mid-point(x) |
d=(X-A)/i |
f*d |
d2 |
f*d2 |
25-30 |
5 |
28 |
-3 |
-15 |
9 |
45 |
30-35 |
10 |
33 |
-2 |
-20 |
4 |
40 |
35-40 |
25 |
38 |
-1 |
-25 |
1 |
25 |
40-45 |
8 |
43 |
0 |
0 |
0 |
0 |
45-50 |
2 |
48 |
1 |
2 |
1 |
2 |
|
N = = 50 |
|
|
= - 58 |
|
(f*d2)=112 |
We have
AM = Arithmetic mean== A+ [/N]*i = 43 + [(-58/50)*5] = 43 +
(-1.16)*5 = 43-5.8 = 37.2
SD () = [(f*d2)/N - ((f*d)/N)2]*i
= [(112/50)- {-58/50} 2]*5
= [2.24 - {-1.16} 2]*5
= [2.24 – 1.3456]*5
= [0.8944]*5
=.9457*5
=4.728
Interpretation:
The average marks scored is 37.2. The marks of students deviate from the Mean
(average) score by about 5 marks.
Very often we use the word
consistency in comparing performances of individuals, teams, etc. How do we
convert this adjective statistically?
We use the term ‘Co efficient of variation’ to measure the
consistency. It is a relative measure of dispersion. It is calculated as
CV = SD*100/AM.
Thus CV is independent of
units and is expressed as %. Lower the percentage more is the consistency (If
SD is a small figure when compared AM obviously the
variation is less)
In the above Example CV =
(4.728*100)/37.2 =12.68
5.5 Example 4:
The runs scored by 2 batsmen A and B in six innings
are as follows.
Batsman
A |
48 |
50 |
54 |
46 |
48 |
54 |
Batsman
B |
46 |
44 |
43 |
46 |
45 |
46 |
Determine who is a better scorer ?. Who is more consistent?
Working:
To know the consistency of
these two batsmen we need to find CV.
We have arrived at
following values for AM and SD for Batsman A in the Example (5.1) (worked out earlier)..
AM = 50
SD = 3.05
CV =SD*100/AM = 3.05*100/50 =6.1%
Let us calculate these
figures for Batsman B
Table of Calculation: (with
actual AM) AM = 270/6 = 45
No |
Runs(X) |
Deviation (D) d= X- |
(Deviation)2
= d2 |
1 |
46 |
1 |
1 |
2 |
44 |
-1 |
1 |
3 |
43 |
-2 |
4 |
4 |
46 |
1 |
1 |
5 |
45 |
0 |
0 |
6 |
46 |
1 |
1 |
|
=270 |
=0 |
=8 |
SD = = = (/N)= (8/6) = (1.33) = 1.15
CV =SD*100/AM = 1.15*100/45 =2.55%
Conclusion:
1. Since A’s AM is more
than that of B (50>45), we conclude that A is a
better scorer.
2. Since B’s CV is less
than A’s (1.15<6.1), we conclude that B is more
consistent.
5.5 Example 5: Marks obtained in a test by X
standard students of 2 sections A and B are given below:
Marks |
No of students in Section A |
No of students in Section B |
25-30 |
5 |
5 |
30-35 |
10 |
12 |
35-40 |
25 |
20 |
40-45 |
8 |
8 |
45-50 |
2 |
5 |
Which section’s performance
is better and which sections performance is more variable (not consistent)?
We need to find AM and CV
to answer these questions.
We have arrived at
following values of AM and SD for section A’s marks in example 5.3. (Worked
out earlier).
AM =37.2 and SD =4.728
CV =SD*100/AM = 4.728*100/37.2 =12.7%
Now let us arrive at AM and
SD for Section B using Step-Deviation Method (A is assumed).
Step 1: Let us chose
assumed mean A =38 (we can assume A=28,33,43,48 also)
Step 2: Find the
step-deviation (d) from the assumed mean d=(X-A)/i:
Where ‘i’ is the size of class interval = 5.
Step 3: Find d2,
f*d and f*d2 for each of the class intervals
Step 4: Compute AM and SD as follows
AM = Arithmetic mean== A+ [/N]*i
SD () = [Sum (fd2)/N-
{Sum (fd)/N} 2]*i:
Marks |
Frequency(f) |
Mid-point(x) |
d=(X-A)/i |
fd |
d2 |
fd2 |
25-30 |
5 |
28 |
-2 |
-10 |
4 |
20 |
30-35 |
12 |
33 |
-1 |
-12 |
1 |
12 |
35-40 |
20 |
38 |
0 |
0 |
0 |
0 |
40-45 |
8 |
43 |
1 |
8 |
1 |
8 |
45-50 |
5 |
48 |
2 |
10 |
4 |
20 |
|
N = = 50 |
|
|
= - 4 |
|
Sum(fD2)=60 |
We have
AM = Arithmetic mean== A+ [()/N]*i = 38 +[(-4/50)*5] =
38+ -0.08*5 = 38-0.4 = 37.6
SD () = [(f*d2)/N - ((f*d)/N)2]*i
= [(60/50)- {-4/50} 2]*5
= [1.2 - {-0.08} 2]*5
= [1.2 – 0.0064]*5
= [1.1936]*5
=1.0925*5 =5.4625
CV = SD*100/AM = 5.4625*100/37.6 =14.52%
Conclusion:
1. Since Section B’s AM is
more than that of section A (37.6>37.2), we conclude that B’s performance is
a better than A.
2. Since B’s CV is more
than A’s (14.52>12.7), we conclude that Section B’s performance is less
consistent (more variable) than Section A’s.
5.5 Example 6: In 2 factories A and B, located in
the same industrial area, the average weekly wages in Rupees and SD are
Factory |
Average
wage in Rs. |
SD of
wage in Rs. |
A |
34.5 |
6.21 |
B |
28.5 |
4.56 |
Determine which Factory has
greater variability.
Workings:
We need to find CV
CV of Factory A =
SD*100/AM= 6.21*100/34.5=18%
CV of Factory B =
SD*100/AM= 4.56*100/28.5=16%
Conclusion: Since Factory A’s CV > Factory B’s
(18>16), A has more variability in wages. (Note:
Though Factory A pays more salary to its employees, it has large difference in
wages between its employees)
5.5
Summary of learning
X = Set of scores
=The Arithmetic mean (AM)
d = Deviation from the arithmetic
mean
f = frequency of score
i
= size of class interval
x= Mid-point of class interval
No |
Cases |
Options |
N= |
AM= |
Deviation (d) |
SD() |
1 |
Individual Scores |
|
Number of scores |
=()/N |
X- |
|
A=Any score |
Number of scores |
= A+ ()/N |
X-A |
|
||
2 |
Scores with frequency |
|
|
=/N |
X- |
|
A=Any score |
|
= A + /N |
X-A |
|
||
3 |
Class interval with frequency |
|
|
= /N |
X- |
|
A = Any mid point |
|
= A+ [/N]*i |
d=(X-A)/i |
|
Hint: For Standard Deviation always
remember the common formula:
Depending on the options,
you can substitute f=1 and i=1 to get the correct
formula as per the above table.
Also note also that =0 when any value is not
chosen as an assumed average.
Additional Points:
Combined Standard deviation of two groups:
If the means and standard
deviations of two series are known, then the mean and the standard deviation of
the combined series can be calculated without considering the actual values of
the data in the series.
Let the means and standard
deviations of two series containing n1 and n2 values be X1 and X2 and SD1 and SD2
respectively.
Then:
1. The
combined mean = = (n1 X1+ n2
X2)/( n1 + n2)
2. The
combined S.D. = {(n1 SD12 + n2 SD22
+ n1 d12 + n2 d22)/(
n1 + n2)} where d1 = X1- and d2
= X2-
5.5 Example 7: The first of the two samples has
100 items with mean 15 and standard deviation 3. If the combined group has 250
items with mean 15.6 and standard deviation. Find the mean and the standard deviation of the second
group.
Solution:
Here n1 = 100, n1+n2
= 250, X1 = 15, SD =, =15.6.
We need to find X2 and SD2.
Note that n2=150.
But
= (n1 X1+ n2
X2)/( n1 + n2)
15.6 = (100*15+150* X2)/250
i.e. 150* X2=
{(15.6*250) – (100*15)} = 3900-1500 = 2400
X2 =
2400/150 = 16
d1= X1- =
15-15.6 = -0.6, d2= X2- =
16-15.6 = 0.4
S.D = {(n1 SD12
+ n2 SD22 + n1d12
+ n2d22)/( n1
+ n2)}
= {(100*9 +150*SD22+100*0.36+150*0.16)/250}
13.44 = (900+150SD22+36+24)/250
i.e. 150SD22=
3360-960 = 2400
SD22=
2400/150 = 16
SD2 = 4
Thus the mean of the second
group (X2) is 16 and the standard deviation of the second group (SD2)
is 4.