Definition
A contingency table, also known as a cross-tabulation or crosstab, is a data table used in statistics to display the frequency distribution of variables. Typically, it categorizes survey or observational data into a matrix format, allowing for the comparison of multiple characteristics or attributes. Each cell in a contingency table shows the count or frequency of occurrences for specific pairs of variable combinations, facilitating the analysis of relationships between the variables.
Examples
-
Gender and Age Groups:
A condominium project surveys the gender and age range of its homeowners. The contingency table might look like this:
|
20-30 |
31-40 |
41 and above |
Male |
15 |
20 |
12 |
Female |
18 |
22 |
10 |
-
Smoking Status and Exercise Frequency:
A health study might categorize people by whether they smoke and how frequently they exercise:
|
Rarely/Never Exercise |
Sometimes Exercise |
Regularly Exercise |
Smoker |
30 |
20 |
5 |
Non-Smoker |
40 |
30 |
25 |
Frequently Asked Questions (FAQ)
What is the purpose of a contingency table?
Contingency tables are used to study the relationship between categorical variables and to perform a variety of statistical tests, including chi-square tests for independence or association.
How do you interpret a contingency table?
Interpreting a contingency table involves examining the frequencies in each cell, row, and column to identify patterns and relationships between the variables. It can also involve computing marginal totals and relative frequencies.
Can contingency tables be used for more than two variables?
Yes, while two-dimensional (2D) tables are most common, contingency tables can be expanded to three or more dimensions, though they become harder to represent visually.
What statistical tests are used with contingency tables?
Common tests include the chi-square test for independence, Fisher’s exact test (for small sample sizes), and various measures of association like Cramer’s V or phi coefficient.
What is a marginal total in a contingency table?
A marginal total is the sum of frequencies along any row or column, representing the total occurrences of that characteristic across all categories of the other variable.
- Cross-tabulation: The process of creating a contingency table.
- Marginal Distribution: The totals of rows and/or columns in a contingency table.
- Chi-square Test: A test used to determine if there is a significant association between categorical variables in a contingency table.
- Cramer’s V: A measure of association between two nominal variables, ranging between 0 (no association) and 1 (perfect association).
- Phi Coefficient: Another measure of association for binary variables.
Online References
- Investopedia: Contingency Table
- Wikipedia: Contingency Table
- Statistics How To: Contingency Table
Suggested Books for Further Studies
- “Statistics for Business and Economics” by Paul Newbold, William L. Carlson, and Betty Thorne - Covers a comprehensive range of statistical methods, including the use of contingency tables.
- “An Introduction to Categorical Data Analysis” by Alan Agresti - A detailed guide focused on methods for categorical data analysis, including contingency tables.
- “Statistics in Plain English” by Timothy C. Urdan - Provides clear explanations of basic statistical concepts, suitable for beginners.
Fundamentals of Contingency Table: Statistics Basics Quiz
### What does a contingency table display?
- [x] The frequency distribution of two or more categorical variables.
- [ ] The average values of numerical data.
- [ ] The median values of multiple datasets.
- [ ] The range of continuous data.
> **Explanation:** A contingency table displays the frequency distribution of two or more categorical variables, showing how often each combination of categories occurs.
### What statistical test is commonly used with contingency tables?
- [x] Chi-square test
- [ ] T-test
- [ ] ANOVA
- [ ] Regression analysis
> **Explanation:** The chi-square test is commonly used with contingency tables to assess whether there is a significant association between the variables.
### Can contingency tables be used for continuous data?
- [ ] Yes, but only for large datasets.
- [ ] Yes, they are primarily used for continuous data.
- [ ] No, they are rarely used in any statistical analysis.
- [x] No, they are designed for categorical data.
> **Explanation:** Contingency tables are designed to display the frequency distributions of categorical variables, not continuous data.
### What are marginal totals in a contingency table?
- [x] The sums of the rows and columns.
- [ ] The cell frequencies multiplied by each other.
- [ ] The differences between the largest and smallest cell frequencies.
- [ ] The median frequency of each cell.
> **Explanation:** Marginal totals are the sums of the rows and/or columns in a contingency table.
### What does the phi coefficient measure?
- [ ] The distribution of numerical data.
- [ ] The variation within a single variable.
- [x] The association between two binary variables.
- [ ] The relative position of continuous variables.
> **Explanation:** The phi coefficient measures the strength of association between two binary variables in a contingency table.
### In which situation can Fisher's exact test be used instead of a chi-square test?
- [x] When sample sizes are small.
- [ ] When data is normally distributed.
- [ ] When more than two variables are being tested.
- [ ] When data is continuous.
> **Explanation:** Fisher's exact test is used instead of a chi-square test when sample sizes are small, ensuring more accurate results.
### When visualizing three variables in a contingency table, what is a major downside?
- [x] The complexity and difficulty of representation.
- [ ] The inability to perform any statistical tests.
- [ ] The need for continuous data conversion.
- [ ] The loss of categorical data precision.
> **Explanation:** Adding a third dimension to a contingency table increases complexity and can make it difficult to visually represent and interpret the data.
### What aspect of a contingency table does a Cramer's V measure?
- [ ] The total frequencies of occurrences.
- [x] The strength of association between categorical variables.
- [ ] The average cell frequency.
- [ ] The probability of marginal totals.
> **Explanation:** Cramer's V measures the strength of association between categorical variables in a contingency table.
### Why might one use a contingency table in business analytics?
- [x] To analyze the relationship between different categorical variables like customer demographics and purchasing habits.
- [ ] To compute the linear regression between sales and time.
- [ ] To determine the precise mean profit over a period.
- [ ] To project future data trends accurately.
> **Explanation:** Contingency tables in business analytics help in analyzing relationships between categorical variables, such as between customer demographics and purchasing habits.
### What does each cell in a contingency table represent?
- [x] The frequency of the occurrence of a specific combination of variable categories.
- [ ] The mean of all data points in that category.
- [ ] The sum of all values across the table.
- [ ] The variance of the data related to the category.
> **Explanation:** Each cell in a contingency table represents the frequency of the occurrence of a specific combination of category values for the variables being studied.
Thank you for exploring the intricacies of contingency tables with me and engaging in our comprehensive quiz to solidify your understanding!