This module focuses on a statistical analysis of home field advantage in American baseball. It guides the reader to do two-way ANOVA analyses and suggests possible binomial probability calculations. It presents a dataset with multiple variables. It presents a graphic of side-by-side boxplots of winning percentage by American and national league and home and away.
For those not familiar with major league baseball, the World Series of baseball in the United States is a best-of-seven series. Therefore, the first team to win four games is the victor. To try to make the series fair, the first two games are played at one team's home park, the next three games are played at the other team's park (just two may be played if one team wins four in a row), and the final two games (if needed) are scheduled at the first team's park. It is therefore possible that one team will play four games in its home park while the other team may only play three games at home. It is generally assumed that there is some advantage to playing at home. Each baseball park is different, so there may be some advantage to playing on a familiar field. The hometown fans provide emotional support for the team, and so on.
Type of Material:
Data set/story set
Could be used for an intro statistics class on 2-way ANOVA. This could be a lab, class activity, lecture example, or a homework assignment.
Web browser. Google Chrome, IE, and Firefox all work. Safari was not tested.
Identify Major Learning Goals:
-The learner will gain a deeper understanding of statistical analysis by analyzing the data and writing conclusions in terms of the story.
-The learner will develop skills to “clean” datasets and transfer between desired software.
-The learner will practice statistical concepts of two-way ANOVA, interaction, boxplots, probability, and independence.
Target Student Population:
Intro Statistics class (non-calculus would be ok)
Prerequisite Knowledge or Skills:
The student should have a prior understanding of working with data. They should be able to copy and paste data from different sources. They should also have a theoretical understanding of two way ANOVA analysis and how the data should be organized using their specific software (Minitab, SPSS, Statcrunch, etc.)
Evaluation and Observation
Dataset is realistic. It comes from world series data between 1922 and 1992. The columns in the dataset are very well organized and explained. Most audiences will not have any trouble understanding the data.
The lesson could have more instructions on exactly how it could be used in the classroom. This particular lesson does not appear to be geared to give to the student as-is. Dataset and resulting ANOVA boxplots should be updated (15 years old). Concepts are briefly presented and not explained well. This is not a stand-alone tool.
Potential Effectiveness as a Teaching Tool
Learning goals are stated, the scenario is provided, the dataset is given, and the resulting graphs are presented in compact and succinct manner. This assignment definitely fosters critical thinking. It forces the student to use a couple different types of statistical analyses to come to an overall conclusion.
It would have been useful to have had some follow-up questions (and possible solutions) involving computing some of the probabilities for n-home games (n = 1..4). It would have been helpful to have the ANOVA table provided for instructors. This module does not offer much guidance for the learner. It is not even clear what the learner should accomplish. The two-way ANOVA is briefly mentioned but should be explained more as this is a very difficult concept. However, this is set up as a data set resource, not necessarily a teaching tool directly for students to use.
Ease of Use for Both Students and Faculty
The language and presentation is very clear. No fancy tools (other than statistical analysis software) are required.
A casual user may not know where to click to get the actual data set (Datafile name: World Series)
The data is not easy to read off the screen. The labels do not necessarily match what is below them. Even when put into a spreadsheet (e.g. Excel), the user still must spend some time to parse the data correctly.
This module could not be assigned to students themselves without the instructor telling them how to manipulate the data.
It would have been beneficial for the data to have been given in a multitude of formats (e.g. Excel, SPSS)