5.1. Introduction to Statistics:
Introduction:
1. What will be the population of
2. What is the literacy rate of
3. What is the % of kids not attending to school.
What will be status in next 10/15 years?
4. What is the deviation in salary among people working in an organization?
Statistics
a branch of Mathematics helps to find answers to these types of questions.
In our daily life we come across
news about average rainfall in a place, Minimum and maximum temperatures in a
place, average runs scored by a cricketer, average attendance and similar
terms. They are all calculated based on data. They are useful for planning by
agencies such as Government, for comparing performance of people and for other
purposes.
You must have heard
people saying that a month of current year has been very hot. This observation is normally
based on their feeling. However
this feeling can be checked by correct data. The Metrological department has
many recording stations where they measure the minimum and maximum temperatures
daily.
Let us tabulate the maximum and
minimum temperatures of a city in north
Month |
January |
February |
March |
April |
May |
June |
July |
August |
September |
October |
November |
December |
Maximum (Mid Day) |
15
|
14 |
20 |
18 |
35 |
36 |
40 |
41 |
35 |
30 |
25 |
22 |
Minimum (Early Morning) |
6 |
7 |
10 |
10 |
20 |
22 |
24 |
25 |
22 |
20 |
15 |
-5 |
From the above data it is difficult to guess the temperature
in the middle of any month in a year. Let us see what if we represent the above
data in a graph:
Graphs
The above pictorial representation is recording of
Maximum and minimum temperatures of a place for the Months of January to
December (lowest and highest among any days in those months) of a year. Blue
color line represents the Maximum temperature and pink color line represents
the minimum temperature. This plotting has been done based on the following
data:
Table:
Jan |
Feb |
Mar |
Apr |
May |
Jun |
Jul |
Aug |
Sep |
Oct |
Nov |
Dec |
|
Maximum(0C) |
15 |
14 |
20 |
18 |
35 |
36 |
40 |
41 |
35 |
30 |
25 |
22 |
Minimum(0C) |
6 |
7 |
10 |
10 |
20 |
22 |
24 |
25 |
22 |
20 |
15 |
-5 |
Looking at the data in the above table, isn’t it difficult
to estimate the temperature in the middle of any month?
Don’t you agree that pictorial representation
(called Graph) is much easier to understand compared to the data given in the
above table?
Isn’t there a saying that a picture represents more
than what thousand words say?
Let us understand how this graph has been plotted.
On the horizontal line we see names of months. Each
month is separated by a gap of around 1cm in length and we say that the horizontal
scale is 1cm = 1month. On the vertical line we see markings in steps of 10
starting with -10 ( i.e.-10,0,10,20,30,40,50). We notice that the distance
between two markings on vertical line is approximately 1cm and we say that the
vertical scale is 1cm = 100C. Since we do not have temperatures
recorded in excess of 500C, the markings have been stopped at 500C.
Since we do not have minimum temperatures recorded below -100C, the
markings haven’t been provided for -200C and below that. Though in
this example the scale for horizontal and vertical lines is same, they need not
be same always. Here we used the scale of 1cm. Scale is determined in such a
way that all data can be marked on the sheet.
Note that from the graph it is possible to estimate
easily the minimum and maximum temperature during middle of any month which is
not possible to arrive at easily by looking at data in the table.
In case of geographical map, you must have observed
that the scale used for distance as 1cm =1000Km.
By convention we call the horizontal line as x axis
and vertical line as y axis. Any point in a plane(surface)
is represented by coordinates(x, y).
5.1.1 Example 1: Draw a graph for maximum temperatures based on the
above table. Horizontal line(x axis) will represent months and vertical line(y
axis) will represent maximum temperatures. The months are represented from 1 to
12 for January to December. Then the coordinates are:
x
à |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
y
à |
15
|
14 |
20 |
18 |
35 |
36 |
40 |
41 |
35 |
30 |
25 |
22 |
(x,
y)à |
(1,15) |
(2,14) |
(3,20) |
(4,18) |
(5,35) |
(6,36) |
(7,40) |
(8,41) |
(9,35) |
(10,30) |
(11,25) |
(12,22) |
For marking temperatures we can use the scale 1cm =
50C and start marking from 00C, in multiples of 5(0,5,10,15..). After marking the points (x,y) and joining them, we get a graph as shown below.
5.1.1 Example 2: Assume that you have
collected the following data of time taken to run 100 Meters race in your
school games for the years 2000,2001,2002,2003 and 2004 (First 3 places only).
No |
Name |
Class |
Year |
Time
taken to run 100Meters race |
1 |
Ram |
8 |
2000 |
15sec |
2 |
John |
9 |
2000 |
16sec |
3 |
Krish |
10 |
2000 |
17sec |
4 |
Luis |
9 |
2001 |
12sec |
5 |
Sham |
8 |
2001 |
17sec |
6 |
Gopal |
9 |
2001 |
19sec |
7 |
Ahmed M |
9 |
2002 |
13sec |
8 |
Khan A K |
8 |
2002 |
16sec |
9 |
Arun |
10 |
2002 |
17sec |
10 |
Mohan |
10 |
2003 |
16sec |
11 |
Philips |
8 |
2003 |
17sec |
12 |
Ajay |
9 |
2003 |
18sec |
13 |
Pramod |
9 |
2004 |
14sec |
14 |
Raymond A |
8 |
2004 |
15sec |
15 |
Gopi |
9 |
2004 |
15sec |
Let us consider only those data corresponding
to the time taken by students for running the race. We have
15,16,17,12,17,19,13,16,17,16,17,18,14,15,15 secs.
Since the above data is not in any
particular order, let us arrange them in ascending order. We get
12, 13, 14, 15, 15, 15, 16,
16, 16, 17, 17, 17, 17, 18, 19..
No |
Time (sec) |
Occurrence(Frequency) |
1 |
12 |
1 |
2 |
13 |
1 |
3 |
14 |
1 |
4 |
15 |
3 |
5 |
16 |
3 |
6 |
17 |
4 |
7 |
18 |
1 |
8 |
19 |
1 |
Total |
=15(Total
No of Scores) |
The above
representation of data called ungrouped frequency distribution table.
From the above tabulation we
observe the following:
1. Lowest time taken is 12 Seconds
which happened in the year 2001.
2. Highest time taken is 19
Seconds (among first 3 winners) which happened in the year 2001.
3. The number 17 has highest
occurrence of 4, indicating that most of the prize winners took
17 Seconds to run the distance.
Let us regroup the data
as follows:
No |
Grouping (Class-Interval) |
Occurrence(Frequency) |
1 |
12sec -14sec |
3 |
2 |
15sec-17sec |
10 |
3 |
18sec
-20sec |
2 |
Total |
=15(Total No of Scores) |
The above
representation of data is called grouped frequency distribution table.
When scores (data) are large in number, grouped
frequency distribution tables are very easy for analysis.
If we group students into 3-Second
time intervals {i.e. (12-14),(15-17),(18-20)}
we find the interval of (15sec-17sec) has the highest occurrence of 10, indicating that most of the
prize winners took between 15 to 17
seconds to run the distance. We also notice that if we group results in
different time intervals the conclusion will be different.
5.1.2 Statistical terms
The numbers we have collected are
called ‘Scores (observations)’. The number of times a particular score occurs
is called ‘Frequency’. Some times we group the scores in ranges
(intervals) for meaningful analysis and such sub groups are called ‘Class-intervals’. This
class interval is never fixed and can vary. Based on the class interval chosen,
the conclusion could change. (In the above example we can choose class intervals of 4
-Seconds(ex 12sec-15sec,16sec-19sec).Once a class interval is chosen all data
has to be grouped as per this grouping (i.e. in the above example we can not
have 2-second intervals and 3-second intervals at the same time).The difference
between the highest and the lowest values of the scores(data) is called ‘range of data’. The difference between lower and upper limits of two consecutive classes is
called ‘size
of the class’
Thus ‘Statistics’
could be defined as science of collection, classification, analysis and
interpretation of basic numerical data. It finds applications in prediction of
economic growth of a country, weather pattern of a region, etc. These scientific predications help Government
and Agencies to plan for future. Statistics is used in Genetics, Biological sciences,
Education, Medicine, Economics.
5.1 Summary of learning
No |
Points to remember |
1 |
The
numerical figures collected for analysis are called scores |
2 |
The
number of times a score repeats itself is called frequency |
3 |
The
data arranged in the format of a table containing the score and its frequency
is called frequency distribution table. |
4 |
Grouping
of scores in to smaller groups is called class interval. |