Part of a Series of Tutorials on using Google Sheets
Last updated: October 12, 2017
Comparing Across Categories
For this tutorial, we will use health data to create a bar graph and a scatter plot that reveal differences across categories; in this case, males versus females. For the bar graph, we will work with Household Income which, instead of exact numerical values, is reported as ranges in our sample data. For the scatterplot, we will show how to simultaneously compare height versus weight vs gender.
Getting the Sample Data
We’re going to use as our example data the National Health and Nutritional Examination Survey (NHANES) dataset. In the NHANES dataset, there are several thousand cases, each of which equates to a single person. NHANES data are highly encoded with numbers or abbreviations that stand in for readable words, and also come with a lot of metadata (data about data). These two factors make their datasets confusing for the general public. However, there is a useful data exploration interface for NHANES, which has been developed by Tim Erickson, a freelance science and math educator. The data have already been downloaded from NHANES and reformatted to be easily accessible.
Go to the EEPS NHANES data exploration system web page: http://www.eeps.com/zoo/nhanes/source/choose.php. There you will find a webform where you can select the variables you want to examine, including demographics, body measurements, and biochemistry bloodwork information. Next to each variable, it shows you what units the variable is in. For example, “Age” is in years, “Weight” is in kilograms, and “Height” is in centimeters.
For this tutorial, check the box for H_Income (Household Income) in the Demography section and keep the default variables already marked. In this interface you are required to look at a preview of your data before you get a large sample, so click on the button “Preview the Data” at the top. The data exploration system will load a table with a handful of sample cases in the web page.
If the data are what you expected, enter 2000 in the “Sample size” text box and click on “Get entire sample”. We chose a sample of 2000 to demonstrate how Google Sheets can help you deal with a large amount of data easily. The interface will give you 2000 cases, as requested.
Select all the contents of the table, then copy and paste the data into a new Google Sheet file. It should paste correctly into each spreadsheet cell. If it doesn’t, make sure you have selected only the contents of the table. If you select any text before the table, after the table or in the side panel, everything will get pasted into a single cell; this is a common error. Now that our data are in Google Sheets, we are ready for analysis.
In the following examples, we will compare males and females in two ways:
- To work with categorical data, we will use Household Income in this data set to compare reported Household Income ranges.
- To work with two sets of decimal numbers, we will use Height and Weight and see how their relationship to each other differ between males and females.
Reading the Data
We will create two bar graphs containing Household Income information, one for males, and the other for females. Here, each bar in the graph will represent an income bracket range.
The possible values for household income, in alphabetical order are:
- (blank)
- <$20 K >$20 K
- $0-5 K
- $10-15 K
- $15-20 K
- $20-25 K
- $25-35 K
- $35-45 K
- $45-55 K
- $5-10 K
- $55-65 K
- $65-75 K
- $75+ K
- DK (don’t know)
- refused (person chose not to give the information)
Notice that these are not just plain old numbers. There are also symbols, signs, and letters. To people, these represent a range of numerical amounts, but to computers it’s treated as text data and reads one text character at a time. Comparing “$5-10 K” against “$45-55 K”, it sees the “$” as the first character on both, then looks at the next one. Since “5” comes after “4”, the computer sorts “$5-10 K” after “$45-55K”. It is our job to make sure these are ordered correctly in our graphs.
Filtering for Female and Male Data
Turn on the Google Sheets Filter tool by clicking on it. It’s on the menu bar and looks like a funnel.
Turning on the Filter tool adds little blue down arrows next to each column header.
Clicking on any of these will give you options for filtering the different variables in each column. This means you can show or hide specific variables like female or male for gender. Let’s try it out. Click on the blue down triangle for the first column header “Sex” and click to uncheck the checkmark next to “Male.” Hit the OK button.
Now we will only be able to see the cases for female throughout the entire spreadsheet. Looking at the row numbers, you’ll notice that the numbers skip a few sometimes, indicating that the Male cases are only hidden, not deleted.
Click on the letter for the column for household income. Go to Insert-> Chart and it will create a bar graph automatically. Hover over each bar in the bar graph and you will see how many females are in each household income bracket.
Assembling and Organizing the Data for Chart-making
To show Household income differences for males versus females, we will need to reorganize the data in a different format. The bar graph above, showing household income for females, summarizes and reports on the data, giving the values for each bar that represents an income range. We will need to grab the numbers for females and males for each income range to show a different bar graph that can compare the two.
In a separate worksheet, in column A, put down the different values for Household income in increasing order, starting at cell A2. Put “Household Income Bracket” in A1, “Female” in B1, and “Male” in C1. Then list out the ranges below:
- $0-5 K
- $5-10 K
- $10-15 K
- $15-20 K
- $20-25 K
- $25-35 K
- $35-45 K
- $45-55 K
- $55-65 K
- $65-75 K
- $75+ K
- Under $20 K
- Over $20 K
- Don’t Know
- Refused
Hover your cursor over each bar of household income categories with the filter on to show only Females. Fill in the second worksheet with the numbers you see for each bar. Once you have all the numbers for females, change the filter on the gender column to show only males. The bar graph will automatically update. Add those numbers to the worksheet under the column for Male.
Note: We included the DK (don’t know), <$20 K, and >$20 K for completeness. There are also a large number of blanks (846 of them in our sample) where the information is not entered at all; this information is not captured in the graph. If you don’t want to include these numbers, it would be considerate for the inquisitive reader’s understanding to add a footnote to your graph mentioning this.
Making the Graph
When you have finished putting in all the values, select the three columns and go to Insert->Chart. It will automatically give you a chart like so:
This has the information we want but it’s difficult to compare across Male and Female household incomes when their bars are stacked on top of each other. Click on the white space on the chart, then click on the vertical three dots at the top right corner of the chart to go Chart Editor mode.
There, change the chart type from “Stacked column chart” to a regular “Column chart.” A column chart is a vertical bar graph.
Now we have a graph that is easier to visually compare the two.
For each income category, there is a bar for females and a bar for males. Notice that the income categories are displayed in the order in which we entered them; we chose to put “<$20K” and “>$20K” at the end with “Don’t Know,” and “refused” since these are the odd options.
What can you say about male versus female household income based on this dataset?
Making the Graph in an Infographic Canvas Tool
Select the option to make a column chart or a vertical bar graph. Then in the data table for the chart, paste in the data we assembled from Google.
Sheets that was used to make the column chart. Make sure you label your axes if the infographic canvas tool doesn’t do it automatically and use a fitting title such as “Female versus Male Household Income.”
Comparing the Relationship between Height and Weight Variables across Males and Females
Comparing male and female in a scatterplot of height vs. weight will be easier than comparing male and female household incomes because we won’t have to get the number of cases in each category. How do we do this? We can take weight and height to be the x and y axes, and then use different colors for males and females.
Use the Filter tool in Google Sheets on the gender column to show only the males. Select and copy all the data cells for weight and height.
Paste this into a new worksheet starting at cell A3. Do the same for Female by filtering the data for Females and pasting the weight and height columns into Columns C and D. Put “Female” in cell C1 to label that series.
For the axis labels, if you are not sure, the units can be found in the EEPS NHANES data explorer where you select variables to download.
Making the Scatter Plot in an Infographic Canvas Tool
Currently, Google Sheets does not support having multiple categories for scatter plots. (This may change later on. So we do not have a Google Sheets example for you. But you can make this in Venngage.) Find the scatter plot option in your infographic canvas tool of choice. In the place where you enter data for the scatter plot, copy-paste the data from your Google Sheets worksheet with the separated Male and Female Weight and Height data.
Using Venngage, it would look something like this:
Reflecting on the Data
Now that you’ve completed this walkthrough, here are some key points to take away:
- Weight and height increase together, but there’s a curve so it’s not a linear relationship.
- In the graph above, there is a outlier point that has a weight of 0 kg and a height of 150 kg. It means that the weight variable is missing for that person.
- On average, males are a bit taller than females.
- There are only 3 points of data around 200kg that show that males can weigh more than females, but this is not substantial enough to say that males generally weigh more than females.
You must be logged in to post a comment.