Class Notes: Thursday
9/19/02
- Check new material on student pages (from Class Home Page)
-
Histograms, and presentations?
Fun example: (to illustrate issues of "random sampling")
Estimate the proportion of males students at UNC
(reflects "your chance of getting a date")
Question:
How to get the data?
Approach:
draw a sample, and use sample proportion as an "estimate"
Sample Size:
25 ---> reasonably large, but not too
tedious
Deep exploration:
try both "dumb" and "smart" sampling methods
Method 1: Take the 25 people "sitting nearest you in class"
Method 2: Stand at a doorway, and "tally the first 25 people to walk through"
(allowed choice between "intelligent" or "crazy", e.g. restroom door)
Method 3: Write down first 25 names in your head
(can know them, or else "famous people", e.g. athletes)
Method 4: Choose a "random sample"
(based on student telephone directory)
(sampling is important, because too many too count!)
Expectation:
1st three are "dumb" (but for different reasons), last is "smart"
Data Analysis by Excel, from an earlier class project
(students actually drew samples using above methods)
(we won't do this because our class is too small)
- Data as count is 1st 4 columns
- Convert to proportions in next 4 columns
- Bin Grid for Histograms in Column J
- Histograms
* All look "mound" (bell) shaped
* But different "centerpoints"
* And different "spreads"
- Intuitive ideas:
* Q1 "moved to right": since more females in class than at UNC
* Q1 "less spread": since "less variation from smaller population"
(extreme case: sample size = population size)
* Q2 "much more spread": reflects wide choice of doors
* Q3 "moved to right": bias towards males when "thinking up names"?
* Q3 "very spread": different people have different biases?
* Q4 "maybe about right"?
Deeper Look:
- Found "true proportion" = 0.43 (for that year)
- Can compare with sample means
* Q1: 0.39 < 0.43 (too small as indicated above)
* Q2: 0.47 > 0.43 (too big)
* Q3: 0.48 > 0.43 (too big, as indicated above)
* Q4: 0.42 Acceptably close?!?
(can analyze with more sophisticated statistical tools)
- Could do similar things with "standard deviation", and "spread"
- What "should the picture look like"?
Useful statistical model: "Binomial"
* Very smooth "mound shape"
* Looks like smooth version of others
* With "typical center" and "typical spread"
-
See another statistics course for details (not done here)
Ideas for better presentation?
- Put graphics closer to text discussion?
- Overlay histograms???
-
Others???
Class Assignment 4: Devise a practical random sampling scheme
- based on UNC student telephone directory
- for sample of size 25
- just describe method, don't actually gather sample
- put description of method on your web site
- Hint: use Excel, and consider:
random page ---> random column ---> random student
-
But be careful, e.g. Page 1 with 10 students, Page 2 with 1 student....
Back to Statistics
6D Home Page