Summer of Stats with Sunnie

This summer, I explored a wide range of data sets and put together notes to introduce information designers to useful concepts and terms in statistics.

Final Project Guidelines, or A Brief Walkthrough of the Information Design Process

At Fathom, as I understand, the information design process has roughly four stages: data collection, data cleaning, data analysis, and data representation. Depending on the project, one might be involved in a few parts of the process. Nonetheless it is important for one to understand the effort and expertise required at each stage.

The final project in Fathom’s MIT Information Design course has students work through all of the stages to produce their own data-driven piece. To make sure all the important points and questions are covered, I created Final Project Guidelines, or A Brief Walkthrough of the Information Design Process and did a case study with the so-called “tampon tax” to demonstrate how these guidelines can be used.

I collected data, mostly from articles, on how much sales tax is placed on feminine hygiene products such as tampons and sanitary napkins. I organized the data into a table and added notes if there were any interesting facts or stories regarding each state’s tax rate. Not much cleaning or analysis was needed because the piece was rather straightforward. For the last step of the process, data representation, I created a heat map with R’s ggplot2 package to show at a glance how much sales tax each state places on female hygiene products.

To date, 12 states don’t charge tax on feminine hygiene products. 5 states (AL, DE, MT, NH, OR) don’t have sales tax to begin with, and 7 states (IL, MD, MA, MN, NJ, NY, PA) lifted the tax between 1975 and 2016. 2 states (CT, FL) passed bills to eliminate the tampon tax, but they don’t go into effect until July 2018 and January 2018, respectively.

1hr Introduction to Statistics: What should information designers think about when they first see data?

I also put together material for an hour-long class, titled 1hr Introduction to Statistics: What should information designers think about when they first see data?, and shared it with the team during two sessions for practice and feedback.

In creating the material, I first brainstormed a list of topics that I believe essential and most useful in exploratory data analysis, referencing several textbooks on statistics and data analysis. Then I searched for quality data sets with good documentation to walk through the selected topics. The data sets I settled on are as follows: Boston Airbnb, world GDP and population, Massachusetts crime estimates, significant earthquakes in the world, national wage estimates, Boston street names, popular baby names in the US, and global temperature changes. Most of them have comprehensive metadata and come from government or research institutes or data centers. Using R, I cleaned and analyzed all of the data sets, produced plots, and created a set of material.

Below are some of the findings looking at Fathom members’ names in the baby names data:

New York Philharmonic Orchestra Performance Data

For the last stats session, I explored the New York Philharmonic Orchestra’s performance history database. Since Charlie was working on the next episode of Especially Big Data on the New York Philharmonic’s Open Data movement, I thought it would be fun to dig into the data together as a team.

The performance history database is massive and has detailed information on more than 20,000 performances since the birth of the orchestra in 1842. It took me a while to navigate through the data, and I had to frequently visit Wikipedia and other websites to read about the orchestra’s history. Below are some plots that show certain trends in the orchestra:

In addition to creating stats material, I spent some time learning p5.js through Getting Started with p5.js and The Coding Train, updating the First of Her Kind poster, and reading through Ben’s massive book collection with topics ranging from data mining to graphic design. I also got to meet wonderfully energetic Girls Who Code, eat the best homemade key lime pie, and see the eclipse with pinhole projectors made out of paper and granola boxes. Seven weeks was a quick sprint, but nonetheless full of exciting work and fun memories. Thank you Fathom for the amazing summer!

Weather Girls

Projects at Fathom are highly collaborative – so I enjoy the luxury of designing things far beyond my own technical limitations, because I am paired up with at least one other person with champion developer skills. We also have a few hybrids who are extremely qualified on both fronts – but my own background has been primarily in graphic design and illustration.

To help round out the knowledge of people whose experience in these two areas tends to be a bit lopsided, we’ve held python classes, a typography lecture, and other small workshops to go over specific skills. As a more formal education effort, Ben Fry has also been teaching an information design class at MIT, now in its third semester and an official part of the D-minor curriculum. 

My fellow designer (and co-alum from Wash-UPaul and I took advantage of a small break between client projects to follow along with a few of the MIT assignments. It was a great opportunity to practice our technical skills and code everything ourselves from the ground up, using the p5.js version of Processing. The first project was to design and build custom apps that would pull weather data from the Dark Sky API. Before diving into any API data, I decided to start with some really basic weather sketches just using Processing. I began with a simple bar graph charting ten years of Boston snowfall totals. (For anyone who missed the record-shattering winter of 2015, here’s a photo of a nearly three-story snow pile near Back Bay for reference).

I then built a second sketch that translated the accumulation totals into falling snow, with heavier years translating into larger, increasingly opaque snowflakes. Users could click through all ten years to get a sense of how intense the winter was. The severity of the 110-inch record is reflected nicely in the Processing sketch.

After doing a few smaller studies, I felt ready to attempt working with the weather API. Dark Sky (and many other weather APIs) offer current data on a huge range of specific weather variables. Using a location’s geocoordinates, you can easily track several dozen weather stats such as humidity, visibility, windspeed, type and intensity of precipitation, time of sunrise and sunset – and of course, the moon phase. I tried sketching a simple anemometer (the instrument that looks sort of like a weathervane, except that it measures windspeed rather than direction). The rate of rotation in the sketch is based on current windspeed and plugging in coordinates for different cities. Now that I had one metric working, I was ready to build out more features. I had recently created a small arsenal of vector people, so I borrowed one of the girls and built her a wardrobe of outfits that would be selected according to the current forecast.

I then grabbed the latitude and longitude of about two dozen locations scattered all across the globe, and set up a sketch where people could click through and see not only the weather in these places, but also what to wear. I also built rain and snow into the program – these initial sketches were started in the dead of winter, and during several snowstorms in Boston, it was fun to check in and see equally intense weather reflected in the p5.js sketch.

I added an additional layer for extreme cold – as the temperature drops below zero, a transparent white rectangle becomes slowly visible (Note how cold Oslo looks at a bone-chilling minus 23°F).

Once the clothing, ranges, and base list of cities were all set, Charlie and Olivia helped me clean up the code and set up Weather Girls as a website and Chrome extension. We are hoping to eventually add regional clothing features, including a complete set of all black outfits for New Yorkers and a Boston Terrier to accompany our local weather girl. Visit fathom.info/weathergirls for your local weather details as well as current weather conditions all across the globe!

To the Moon

Lunar Phases is a sketch that grew from a series of mini-projects I developed with the p5.js variant of Processing. Each sketch was an exercise to practice the language and explore programming concepts as I learned.

Inspired by the assignments from our information design course at MIT, I began with a weather app that uses an API call to interpret current wind conditions and temperature as rhythmic patterns.

Another sketch was an attempt to visualize the HSV color model and build a color picker that navigates its structure.

Creating a simple visual tool to explain a process or system is a good coding challenge, and weather APIs are an excellent place to start because the data is observable, easy to access and frequently updated. I realized that natural weather patterns also provide great visual forms to start sketching, which got me thinking about how the lunar cycle might be illustrated in an interactive way.

After working out some ideas on paper, I began a p5.js sketch to see how I could animate between different phases. The first step was finding a way to draw a curve that could smoothly bend between a crescent and full circle. Using the bezierVertex() function, I tried layering curved shadows on an ellipse to approximate the effect of sunlight passing over the moon’s face. The edge of that twilight zone has a surprisingly aggressive astronomical name: the “terminator.”

To test the smoothness of the transitions, I mapped the curve’s intermediate bezier handles and control points to the mouseX position. The first shadow I created had its handles placed halfway along each curve, which made it too clunky to match a circle. I drew sample ellipses in Adobe Illustrator to measure the ratio of a handle to its curve. Then I stored this fraction as a variable to smoothly bend the shape into a perfect curve at any phase.

As the sketch developed, other elements became great exercises for wrapping my head around programming concepts, improving interactivity and thinking about the potential context of use. Mapping the lunar cycle directly to cursor position was a useful first step, but became a navigation obstacle as I added more pieces. In later iterations, I found that updating the cycle based on the distance dragged creates a much more natural interaction, especially with touch inputs on mobile devices.

Thinking through how to translate these pixel distances into meaningful expressions of time was also a satisfying challenge. I learned how to create arrays where I could store and update variables, such as a percentage of the lunation (that’s the full lunar cycle, or period between syzygies for the layperson). This number can be fed back in different forms: an input of 550 pixels might return “65% full”, “Waxing gibbous” and “Tomorrow” — all variables that I can then piece together to provide useful feedback.

The moon is tidally locked to the Earth, so whether you see a rabbit or a wrinkly old man, that side is always facing us. As a result, people see the same phase wherever they are, and this means the sketch can be updated accurately with a single API call per day. Kudos to Olivia and Ben for writing code to store those daily values, as well as a URL that accepts ‘moon’ with any number of o’s.

Take that, Google.

First of Her Kind

Amidst all the attention given to the 2016 presidential campaign, it was easy to miss an important date in the history of women in American government. One hundred years ago, on November 7, 1916, Jeannette Rankin of Montana became the first woman to be elected to federal office when she won a seat in the U.S. House of Representatives.

Jeanette Rankin became the first woman elected to Congress in 1916.
In 1916, Jeannette Rankin of Montana became the first woman elected to Congress. (Image credit: Sharon Sprung/Public Domain)

Incredibly, her achievement came several years before the passage of the Nineteenth Amendment, at a time when women’s suffrage in the U.S. was a patchwork of state and local laws.

As Rankin put it, she was “the only woman who ever voted to give women the right to vote.”

To date, 45 states have elected female representatives, 27 have elected female senators, and 24 female governors. To celebrate their accomplishments, we created a poster showing the first woman elected to serve as governor, senator, and representative from each state.

The poster is now for sale, and like all our poster projects, the proceeds will be donated to charity.

The poster includes portraits of every woman to serve as the first female governor, senator, or representative from her state.

The past century has seen 392 women serve in these positions, but we wanted to put the focus on the women who broke the initial barrier of being elected by the state communities in which they lived. This is also why we chose not to include women who were appointed to their positions—we wanted to highlight the importance of the democratic process and how it might be shaped by changing perceptions of gender. It may not even seem that surprising that women have been elected to these positions, until you’re reminded that this is just 100 years of our government’s 240-year history, and that dozens of states still haven’t had a woman serve in all three positions.

The colors used in the poster–purple, gold, and green–were inspired by the colors used by suffragists in the United States and United Kingdom. Purple represented loyalty and dedication to the cause of women’s suffrage, gold symbolized “the color of light and life,” and green stood for hope.

A "votes for women" pennant in the traditional suffragette colors. (Image credit: Wendy Kaveney/Creative Commons)
A “Votes for Women” pennant. (Image credit: Wendy Kaveney/Creative Commons)

Three women appear twice on the poster because they were both the first female representative and first female senator from their state. They are Margaret Chase Smith of Maine, Jeanne Shaheen of New Hampshire, and Tammy Baldwin of Wisconsin.

Dianne Feinstein and Barbara Boxer of California were elected on the same ballot, so they’re both included. (Feinstein began serving a few months earlier, because her seat was part of a special election.)

Illustrations from the poster of Dianne Feinstein and Barbara Boxer of California.
Illustrations from the poster of Senators Dianne Feinstein and Barbara Boxer of California.

A few more things we learned along the way:

  • Nine states have elected women to all three positions: Hawaii, Kansas, Louisiana, North Carolina, Nebraska, New Hampshire, Oregon, Texas, and Washington.
  • Mississippi is the only state to have not elected a woman to any of these positions.
  • Delaware just elected their first female representative to the House this year! Lisa Blunt Rochester will represent the state’s at-large district.

While we were compiling the lists of women to feature on the poster, we came across countless inspiring stories. Here are a few that we found particularly interesting:

Edith Nourse Rogers became the first female representative from Massachusetts in 1925, and still holds the record for the longest-serving woman in the U.S. House of Representatives. She served until 1960, and sponsored more than a thousand bills, many focusing on veterans’ issues.

Edith Nourse Rogers in the House chamber in 1926. (Image credit" U.S. House of Representatives)
Edith Nourse Rogers in the House chamber in 1926. (Image credit: U.S. House of Representatives)

After a career in filmmaking, Ruth Bryan Owen was elected in 1928 as Florida’s first female representative. She served two terms and was later appointed by President Franklin D. Roosevelt as the ambassador to Denmark–the first woman to be appointed a United States ambassador.

Ella T. Grasso of Connecticut had a decades-long career in politics. In 1955, she became the first woman to be elected Floor Leader of the Connecticut House of Representatives. In 1974, after serving two terms in the U.S. Congress, she opted to not run for reelection and instead ran for governor of Connecticut. She won, becoming the first female governor of Connecticut and the first female governor in the country who wasn’t a wife or widow of an ex-governor.

After a storied career including fifteen years in the Hawaii House of Representatives, eight years as Hawaii’s lieutenant governor, and six years in the U.S. House of Representatives, Mazie Hirono became Hawaii’s first female senator in 2012. She is also the first Asian-American woman in the Senate. Much of Hirono’s work throughout her political career focused on advocating for pre-kindergarten education.

Mazie Hirono shakes hands with Vice President Joe Biden after being sworn in to the U.S. Senate. (Image credit: Mazie Hirono)
Mazie Hirono shakes hands with Vice President Joe Biden after being sworn in to the U.S. Senate. (Image credit: Mazie Hirono)

We hope you’ll buy a print of the poster through the Fathom print shop and help support some of the worthy causes that receive the proceeds. And we look forward to updating the poster with more firsts in the coming years.



The Measure of a Nation

There’s a great scene in The Newsroom where a college student in the audience of a Q & A panel asks curmudgeonly TV news anchor Will McAvoy to give a reason why America is the greatest country in the world.

After a few facetious half-answers about the New York Jets, the panel moderator coaxes McAvoy into a profanity-laced rant. Why is America the best? “It’s not,” McAvoy snaps. “There is absolutely no evidence to support the statement that we’re the greatest country in the world.”

(Video contains strong language)

While the scene is an entertaining introduction, the issue deserves more attention than five minutes at the beginning of a TV show. Fortunately, Columbia statistician and economist Howard Friedman has already taken a close look at how the United States stacks up against other countries around the world.

In his book The Measure of a Nation, Friedman compares the United States to 13 similar industrial countries using a variety of data to indicate national well-being. He finds that among these competitor countries, the United States doesn’t do very well.


We have the highest homicide rates, highest incarceration rates, lowest voter turnout, and greatest income inequality. We lead in only a few areas, including military and healthcare spending (though we have the worst return on investment when healthcare spending is compared to life expectancy).

To learn more, view the piece and interact with the data yourself.

The data is interesting because it challenges some widely-held assumptions about America’s status as a global leader. “If America were a corporation, it would today be the equivalent of IBM in the 1990s–an industry giant that’s failing to keep up with the times,” Friedman writes.

That’s an angle he takes throughout the book—examining countries as if he were an auditor examining massive corporations, looking for areas of improvement rather than making arbitrary judgement calls.

While it might be tempting to take each country’s average ranking across all indicators and compare them on a grand scale, that isn’t really the point. Since these countries have different cultures, economies, and other factors, it’s hard to make a worthwhile across-the-board comparison. Plus, the indicators aren’t equally weighted. Some, like life expectancy, are probably more important than others, but it would be hard to agree on the correct weighting for each indicator.

But even if we can’t make an overall ranking, it is important to see where other countries are succeeding and try to learn from their success. As Friedman puts it, “companies that are willing to learn will grow and prosper while the manufacturers of black-and-white televisions and eight-track tapes become resigned to the pages of history books.”

Using the 14 countries from Friedman’s book, we collected data on 23 indicators of national well-being in Friedman’s five categories: health, safety, education, equality, and democracy.

Life expectancy: the United States ranks 14th

In order to make fair comparisons with the United States (which is difficult given our size and a host of other factors), Friedman selected countries that were relatively wealthy, with per-capita GDPs greater than $20,000, and relatively large, with populations at least the size of New York City. That eliminates small city-states like Luxembourg.

This chart makes it easy to get a quick idea of how a country is doing, and because the indicators are arranged by category, you can see exactly where a country is succeeding or struggling. Japan, for example, does well in health in safety, but not as well in equality and democracy.

Life expectancy: Japan in first place

The imperfections of rankings

This whole visualization is built around the idea of ranking countries. Nearly everything in the design, from the position of countries to the colors, is determined by rank.

But the truth is, it’s not always a good idea to rank things, and depending on the data, a ranking can give a faulty impression. For an example, let’s look at the “income mobility” indicator, which come from a 2009 paper by London School of Economics researcher Jo Blanden.

In the paper, Blanden uses a complicated formula to calculate an “elasticity index” representing income mobility between generations. But since these figures are estimates, they come with standard errors, which in some cases are quite large.

Blanden gives the U.S. an elasticity index of 0.41 and the U.K. an index of 0.37 (lower is better). But the U.S. has a standard error of 0.09 and the U.K. has a standard error of 0.05, meaning it’s possible that the U.S.’s actual index is as low as 0.32 and the U.K.’s is as high as 0.42. That totally reverses the rank.

Error bars blanden
This chart shows the standard errors for the income elasticity indices of 12 countries. Notice that the 95% confidence intervals of many countries overlap, making them difficult to rank. (Image credit: Blanden, 2009)

In fact, Blanden notes this and explicitly warns against using this measure to rank countries. “While it is tempting to immediately form the estimates into a ‘league table,’ we must pay attention to the size of the standard errors,” she writes. “Large standard errors on the Australian, French, British and U.S. estimates [make] it unclear how these countries should be ranked.”

Since this visualization is just an introductory piece for public use, we were less concerned about the detail lost by using rankings (plus they’re used in the book, so we didn’t want to leave them out). But if we were to revisit the piece for an audience of policy-makers, for instance, we’d be using a very different representation that would de-emphasize rankings and do more to expose details like standard error to avoid any erroneous conclusions.

To learn more, view the piece and interact with the data yourself.

Founded in 2010 by Ben Fry, Fathom Information Design works with clients to understand complex data through interactive tools and software for mobile devices, the web, and large format installations. Out of its studio in Boston, Fathom partners with Fortune 500s and non-profit organizations across sectors, including health care, education, financial services, media, technology, and consumer products.

How can we help? hello@fathom.info.