5.1. Introduction to Statistics:

 

 

Introduction:

 

1.  What will be the population of India in 2005,2010?

2.  What is the literacy rate  of india and its states?

3.  What is the % of kids not attending to school. What will be status in next 10/15 years?

4.  What is the deviation in salary among people working in an organization?

 

Statistics a branch of Mathematics helps to find answers to these types of questions.

 

In our daily life we come across news about average rainfall in a place, Minimum and maximum temperatures in a place, average runs scored by a cricketer, average attendance and similar terms. They are all calculated based on data. They are useful for planning by agencies such as Government, for comparing performance of people and for other purposes.

 

You must have heard people saying that a month of current year has been very hot.  This observation is normally  based on their feeling. However this feeling can be checked by correct data. The Metrological department has many recording stations where they measure the minimum and maximum temperatures daily.

Let us tabulate the maximum and minimum temperatures of a city in north India.

 

Month

January

February

March

April

May

June

July

August

September

October

November

December

Maximum

(Mid Day)

15

14

20

18

35

36

40

41

35

30

25

22

Minimum

(Early Morning)

6

7

10

10

20

22

24

25

22

20

15

-5

 

From the above data it is difficult to guess the temperature in the middle of any month in a year. Let us see what if we represent the above data in a graph:

Graphs

 

The above pictorial representation is recording of Maximum and minimum temperatures of a place for the Months of January to December (lowest and highest among any days in those months) of a year. Blue color line represents the Maximum temperature and pink color line represents the minimum temperature. This plotting has been done based on the following data:

 

Table:

Monthsà

Jan

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

Maximum(0C)

15

14

20

18

35

36

40

41

35

30

25

22

Minimum(0C)

6

7

10

10

20

22

24

25

22

20

15

-5

 

Looking at the data in the above table, isn’t it difficult to estimate the temperature in the middle of any month?

Don’t you agree that pictorial representation (called Graph) is much easier to understand compared to the data given in the above table?

Isn’t there a saying that a picture represents more than what thousand words say?

 

Let us understand how this graph has been plotted.

 

On the horizontal line we see names of months. Each month is separated by a gap of around 1cm in length and we say that the horizontal scale is 1cm = 1month. On the vertical line we see markings in steps of 10 starting with -10 ( i.e.-10,0,10,20,30,40,50). We notice that the distance between two markings on vertical line is approximately 1cm and we say that the vertical scale is 1cm = 100C. Since we do not have temperatures recorded in excess of 500C, the markings have been stopped at 500C. Since we do not have minimum temperatures recorded below -100C, the markings haven’t been provided for -200C and below that. Though in this example the scale for horizontal and vertical lines is same, they need not be same always. Here we used the scale of 1cm. Scale is determined in such a way that all data can be marked on the sheet.

 

Note that from the graph it is possible to estimate easily the minimum and maximum temperature during middle of any month which is not possible to arrive at easily by looking at data in the table.

 

In case of geographical map, you must have observed that the scale used for distance as 1cm =1000Km.

 

By convention we call the horizontal line as x axis and vertical line as y axis. Any point in a plane(surface) is represented by coordinates(x, y).

 

5.1.1 Example 1: Draw a graph for maximum temperatures based on the above table. Horizontal line(x axis) will represent months and vertical line(y axis) will represent maximum temperatures. The months are represented from 1 to 12 for January to December. Then the coordinates are:

 

x à

1

2

3

4

5

6

7

8

9

10

11

12

y à

15

14

20

18

35

36

40

41

35

30

25

22

(x, y)à

(1,15)

(2,14)

(3,20)

(4,18)

(5,35)

(6,36)

(7,40)

(8,41)

(9,35)

(10,30)

(11,25)

(12,22)

 

For marking temperatures we can use the scale 1cm = 50C and start marking from 00C, in multiples of 5(0,5,10,15..). After marking the points (x,y) and joining them, we get a graph as shown below.

 

 

 

5.1.1 Example 2: Assume that you have collected the following data of time taken to run 100 Meters race in your school games for the years 2000,2001,2002,2003 and 2004 (First 3 places only).

 

No

Name

Class

Year

Time taken to run 100Meters race

1

Ram

8

2000

15sec

2

John

9

2000

16sec

3

Krish

10

2000

17sec

4

Luis

9

2001

12sec

5

Sham

8

2001

17sec

6

Gopal

9

2001

19sec

7

Ahmed M

9

2002

13sec

8

Khan A K

8

2002

16sec

9

Arun

10

2002

17sec

10

Mohan

10

2003

16sec

11

Philips

8

2003

17sec

12

Ajay

9

2003

18sec

13

Pramod

9

2004

14sec

14

Raymond A

8

2004

15sec

15

Gopi

9

2004

15sec

 

Let us consider only those data corresponding to the time taken by students for running the race. We have 15,16,17,12,17,19,13,16,17,16,17,18,14,15,15 secs.

 

Since the above data is not in any particular order, let us arrange them in ascending order. We get

12, 13, 14, 15, 15, 15, 16, 16, 16, 17, 17, 17, 17, 18, 19..

 

No

Time (sec)

Occurrence(Frequency)

1

12

1

2

13

1

3

14

1

4

15

3

5

16

3

6

17

4

7

18

1

8

19

1

Total

=15(Total No of Scores)

 

 

 

 

 

 

 

 

 

 

 

 

The above representation of data called ungrouped frequency distribution table.

 

From the above tabulation we observe the following:

1. Lowest time taken is 12 Seconds which happened in the year 2001.

2. Highest time taken is 19 Seconds (among first 3 winners) which happened in the year 2001.

3. The number 17 has highest occurrence of 4, indicating that most of the prize winners took  17 Seconds to run the distance.

 

Let us regroup the data as follows:

 

No

Grouping

(Class-Interval)

Occurrence(Frequency)

1

12sec -14sec

3

2

15sec-17sec

10

3

18sec -20sec

2

Total

=15(Total No of Scores)

The above representation of data is called grouped frequency distribution table.

 

When scores (data) are large in number, grouped frequency distribution tables are very easy for analysis.

 

 

If we group students into 3-Second time intervals {i.e. (12-14),(15-17),(18-20)} we find  the interval of (15sec-17sec) has the highest occurrence of 10, indicating that most of the prize winners took  between 15 to 17 seconds to run the distance. We also notice that if we group results in different time intervals the conclusion will be different.

 

5.1.2 Statistical terms

 

The numbers we have collected are called ‘Scores (observations)’. The number of times a particular score occurs is called ‘Frequency’. Some times we group the scores in ranges (intervals) for meaningful analysis and such sub groups are called ‘Class-intervals’. This class interval is never fixed and can vary. Based on the class interval chosen, the conclusion could change. (In the above example we can choose class  intervals of 4 -Seconds(ex 12sec-15sec,16sec-19sec).Once a class interval is chosen all data has to be grouped as per this grouping (i.e. in the above example we can not have 2-second intervals and 3-second intervals at the same time).The difference between the highest and the lowest values of the  scores(data) is called ‘range of data’. The  difference between lower  and upper limits of two consecutive classes is called  ‘size of the class’

 

 

Thus ‘Statistics’ could be defined as science of collection, classification, analysis and interpretation of basic numerical data. It finds applications in prediction of economic growth of a country, weather pattern of a region, etc.  These scientific predications help Government and Agencies to plan for future. Statistics is used in  Genetics, Biological sciences, Education, Medicine, Economics.

 

5.1 Summary of learning

 

 

No

Points to remember

1

The numerical figures collected for analysis are called scores

2

The number of times a score repeats itself is called frequency

3

The data arranged in the format of a table containing the score and its frequency is called frequency distribution table.

4

Grouping of scores in to smaller groups is called class interval.