Linear Regression

A WebQuest for Math 153 (Introduction to Statistical Methods )

Designed by

Donna Hiestand-Tupper
dtupper@ccbcmd.edu

 Scatter Plot

Introduction | Task | Process | Evaluation | Conclusion | Back to Lecture



Introduction

I am going to start off this chapter with a little story. A statistician and his/her buddy go out one night. All night long, the friend drinks vodka and water. The next morning, the friend woke up with a terrible headache. A few weeks later, the statistician and his/her buddy went out again. This time, the friend drank bourbon and water all night long. The next morning, the friend woke up with a terrible headache. A few weeks later, the statistician and his/her buddy went again. This time, the friend drank scotch and water all night long. Again the next morning, the friend woke up with a terrible headache. Finally, the friend told the statistician that they could not go out anymore because everytime they did, he/she woke up with a terrible headache. The statistician said, "the correlation is obvious, don't drink the water."

We all know that the cause of the headache was the alcohol, not the water. The point of this story is that just because there appears to be a relationship between two variables, does not mean that there is a cause and effect relationship between those two variables.

In this chapter, we looked at two variables coming from one population and tried to determine whether some type of relationship exist between these two variables. For example, was there a relationship between height and weight of healthy men. If a relationship existed, we determined an equation that best represented that relationship. By finding an equation that represented this relationship, we could make predictions based on the given data, (e.g. we could predict a baseball player's weight given his height).

 


The Task

Go to the website http://mlb.mlb.com/NASApp/mlb/index.jsp. Go to the bottom of the page and click on the arrow next to "Jump To" and select the "roster" of your favorite team.

From the homepage of your favorite team.

 


The Process

 

  1. Using either a simple random or systematic approach, randomly select 10 baseball players from your favorite team;
  2. Record the team used for this assignment;
  3. Record the ordered pair consisting of their height and weight;

    players
    name
                                                                                                                                            
    height                                                                      
    weight                                                                      

     

  4. Use Statdisk to create a scatter plot of the above data.  Submit the Statdisk printout;
  5. Use Statdisk to determine the correlation coefficient, r;  Submit this printout as well;
  6. Compare the value or r to the critical value found in table A-6 and determine if there is a linear relationship between the heights and widths of players on your favorite team;
  7. Determine a regression equation;
  8. Can the regression equation be used to predict the weight of a baseball player who is 72"?  Why or why not?;
  9. Predict the weight of a baseball player who is 72"
  10. Assuming the regression equation could be used for a player who is 72" tall, could it also be used to predict the weight of a baseball player who is 60"?  Why or why not?;
  11. Predict the weight of a baseball player who is 60".

Evaluation

You are required to answer each of the above questions.  When grading your web assignment, I will be using the rubric below.  Each category (calculation, use of technology, theory and written responses) is worth 25 points.  The number of points you get per category is based on the ratings excellent, good, acceptable, poor or blank.   Use the rubric below as a self-check before turning in your assignment.

Excellent
1
(25 Points)
Good
2
(20 Points)
Adequate
3
(15 Points)
Poor
4
(10 Points)
Blank
5
(0 Points)
Score
 

Calculations

 

No calculation or roundoff errors are present.

Roundoff errors are present.

Calculation errors present.

Calculation and roundoff errors were made.

No Calculations Shown

                   
 

Use of Technology

 

 

Shows complete and appropriate use of Statdisk & TI.

Shows appropriate use of Statdisk & TI, but a calculation error is present.

Limited or inappropriate use of Statdisk or TI.

No use of Statdisk, but TI-83 was used.

No use of Technology evident.

       
 

Theory Comprehension

 

 

Shows complete comprehension of the Statistical Theory.

Understands most of the theory, however minor errors made.

Shows some understanding of the theory, however explanations are unclear.

Shows limited understanding of the theory.

No understanding of theory evident.

        
 

Written Responses

 

Well written.  Neat, typed. No grammatical or spelling errors.

Ideas clearly presented, but spelling or grammatical errors are present.

Poorly written response.  Many spelling or grammatical errors present.

Poorly written response.  Many spelling or grammatical errors present. Assignment is hand written.

Questions left blank.

          

Conclusion

Although technology will always give us a regression equation, it is our job to determine whether or not the regression equation is the appropriate tool to make predictions.  Not all data is linear.  Even if a data set has a linear relationship, the regression equation is only valid for data values in the domain used to create the regression equation.  Scatter plots can be used to give us an idea about the relationship of our data.  However, a formal hypothesis is still required.


Last updated on August 14, 2005. Based on a template from The WebQuest Page