Data Collecting

Summary
The student investigates basic methods of data collecting in the lab using a fake data set. The data collection method is restricted to simulate some of the problems of astronomy.

Background
Suppose you read a study in a magazine stating that in the general population:
20-30%of people have ear piercings, and
1-3% of people have body piercings.

Scientists would write this as:
25 +/- 5%of people have ear piercings, and
2 +/- 1% of people have body piercings.

The first number is the result, and the second number is the random error in the result. Random error indicates how the results might change if you did the survey many times. Usually, if you repeat the survey, your results will agree with the number quoted within the error. For example, in 13 different surveys, you might find thirteen different percentages of people with pierced ears: 25,26,27,20,24,24,25,29,36,23,26,25,24,... Occasionally the variation will be greater. This lab shows you how to find the random error and what it means.

A second kind of error is more difficult to quantify. This is the error that results from biases in the experiment, and is called 'systematic error'. You will also explore this kind of error.

Procedure
Print out the answer sheet.
  1. Suppose you want to know if these numbers are correct for the population of students at your university. Devise a method to collect a data set to test your hypothesis. You are limited to about 2 hours to collect the information but can take as long as you like to process it.
    This simulates the fact that astronomers must compete for telescope time and thus only get to use them for a short time.

  2. Devise an alternate plan in case the observations you want to make are impossible for some reason. (e.g. Your equipment was broken, you were rendered mute or locked in a high tower).
    Astronomy is like the high tower situation: you are unable to do anything but look at a distance. Also sometimes the kind of observation you want to try just doesn't work, due to poor weather conditions, for example.

  3. Why can't you include only a small number of people in your sample?
    This is known as a "small sample problem" in astronomy. An example is describing all stars after only looking at the Sun.

  4. What problems might occur if you only chose to look for subjects in a trendy cafe?
    In astronomy this is called a "selection effect"---you are choosing the types of objects to look at in advance, by limiting the location. They might be very different than objects elsewhere. This effect is nearly always present in astronomy. We can only observe objects that are either close or bright, for example.

  5. How are the errors caused by the biases in questions three and four different? Are these systematic errors or random errors?

  6. Suppose you collect data on 523 students. If the first percentages given above are correct (25% and 2%), what numbers of students with each type of piercing do you expect?

  7. Here are the results from your new survey of university students:
    523Students in total,
    138Students with ear piercings, and
    34Students with body piercings.

    Compare your results with those from the general population that were reported in the magazine study. Is the percentage of students with ear piercings significantly different from the variation given in the study in the magazine?

  8. Is the fraction of students with body piercings significantly different from the variation given in the study in the magazine?

  9. Explain any apparent differences between your new study and the magazine study in terms of possible differences between students and the general population.

  10. What new observations could you make that would prove or disprove the existence of the differences you invented in the previous question?

  11. Statistically, a study like this has an error (formally called the standard deviation) given by:

    error = sqrt[(total - number) x (number/total)].

    As a percentage, the error is:

    percent error = 100 x (error/total).

    where number is the number of students with a particular kind of piercing and total is the total number of students in the survey.
    (Aside: This is only true for a situation such as coin tossing, where the outcome can only be heads or tails, nothing in between.)
    Find the random error in your results. There is roughly a 2 in 3 chance that your measurement plus or minus this error is representative of all students.
    Astronomers must always be mindful that errors only indicate the correct range "most of the time". Occasionally the data is just plain bad and doesn't represent reality. As an example, suppose that many of the people in your survey lied about their piercings. That data is of poor quality, and doesn't represent reality.

  12. Did your study have larger or smaller errors than the one the magazine published?

  13. Did the magazine's survey use a larger or smaller sample than yours? (This is a quantitative question, and the answer can be derived from your solutions to question 11.)


© 2003 Weber State University
Revised: 24 April, 2003