November 07, 2014

Written by:

Topics: ,

Because of the immense popularity of All Streets, we expanded our product line, and created All Street maps for individual states. To accommodate our new selection of products, Terrence worked feverishly to design the Fathom Print Shop. The site officially launched yesterday—just in time for the holiday season.

The posters are available in two sizes, 16×20 inches and 24×36 inches. You can purchase the poster with (or without) a frame, and also select from a choice of warm, light, or dark background colors.

Showing solely streets unveils some interesting characteristics about population settlement, topography, and waterways.

Eastern California’s national parks are visible and starkly contrasted from the bounty of roads to its west.
Our home state of Massachusetts has obvious correlations to population density due to the heavy lines around the Boston metro area.
North Dakota’s network of roads are reminiscent of Manhattan’s street grid.
The country’s largest and most populated city, New York, has the densest road coverage near the metropolitan center, leaving a tiny rectangular speck open for Central Park.
Alaska is the only exception to the collection of states we are offering. There are so few roads, that little can be understood about the state.

For those interested in viewing All Streets for the U.S. territories, we added Puerto Rico and Guam into the mix.

Now that the Fathom Print Shop is live, we’re ready to take your orders!

October 27, 2014

Written by:

Topics: , , , , ,

Public data is increasingly available from multiple sources: governments, economists, and research communities, to name a few. Open access is a fundamental prerequisite for civic participation and transparency, but freely-available and intuitive tools that allow users to extract meaningful narratives from the data are also crucial. That was our central motivation to develop the visualization tool Mirador, and also for the Mirador Data Competition we launched last month. The richness of public datasets is often extraordinary, and many of them are the result of the continued efforts of data collection teams, statisticians, and researchers over several years, sometimes decades. In this post, I would like to share some associations I found using Mirador on a large dataset of behavioral risk factors. These associations stand here simply as suggestive hints or directions that one can use to delve further into the data using more rigorous statistical analyses. This highlights the main purpose of Mirador as a visual exploratory tool.

Many others have recognized the importance of open data and public participation, and had organized similar data challenges or competitions in the past to spur the interest of various audiences in data visualization and analysis. Around the time we launched our own competition, I came across the HHS VizRisk, an event organized by the U.S. Department of Health & Human Services that seeks for visualizations of behavioral data to inform personal and policy decisions. The main dataset in VizRisk comes from the Behavioral Risk Factor Surveillance System (BRFSS), a nation-wide phone survey that collects information about health risk behaviors.

I compiled the BRFSS data made available for VizRisk into Mirador’s format (which is basically a CSV table plus some additional metadata) and did some quick explorations of my own. The screen capture below shows Mirador after loading the 2011 BRFSS data, comprised of around 500,000 respondents:


The happiness of the self-employed

A variable in the BRFSS dataset that I believe is reasonable to choose as a global indicator of well being is “General Health”. Respondents are asked to characterize their health status using 5 options: poor, fair, good, very good, and excellent. So, it would be interesting to look at association patterns between General Health and other socio-behavioral indicators. One association that stood out for me is between General Health and Employment status. This other variable records if the respondent is employed for wages, self-employed, student, unemployed, etc. I call this association the “happiness of the self-employed”, because for the entire sample of 500,000 respondents you can see that self-employment relates with a slight increase in reported excellent general health:


We can conclude that self-employed people are more likely to respond that they have excellent General Health, although the difference is only of a few percentage points. Before going any further, lets first make clear what this plot (called eikosogram) means: the percentage 25.43% is highlighted for the “self-employed” category in the column (corresponding to the Employment Status variable), and the “excellent” category in the row (corresponding to the General Health variable). This means that 25.43% of the self-employed respondents answer that they have excellent general health. In other words, it is a conditional probability that can be denoted in mathematical notation as:

P(excellent health|self-employed) = 0.2543

Mirador is designed for interactive visualization, so only the labels of the hovered items are shown in the interface. For clarity, I have saved the health-employment eikosogram and added the labels for some the categories -employed (for wages), self-employed, homemaker and student:


Next, we can explore what factors might effect this association. Variables such as sex, age, income and ethnic group would probably have an impact on it. It is easy to check with Mirador the effect of any of these factors. For example, the percentage of women reporting excellent health when they are self-employed in relation to employed for wages is higher than for men: 27% versus 24%, while both report similar levels of excellent health for the employed status:


Of course: correlation does not imply causation, but it is worth noting nonetheless. Since we can easily adjust by other socio-economical factors, I searched for a combination of factors that maximize the “happiness” among self-employed respondents.

Age and income have a large effect, with middle age individuals in higher income brackets reporting excellent health among those self-employed. After fixing the covariates in those ranges (age: 35-54, income 35k+), I started looking at the association among different ethnicities, as classified in the BRFSS data: white, black, asian, hispanic, pacific islander, native american, and other/multiracial. What I found is that the highest levels of excellent health for self-employed respondents occurs for the asian ethnicity. The difference between employed for wages and self-employed is quite substantial for this group, approximately 25% versus 44%:


What can we conclude from this pattern in the data? Again, correlation is not causation, but we can wonder if this pattern is due to cultural or economic factors. It is not possible to say from the data, but at least we have a tentative hypothesis we can test further. We also have to be careful with the fact that when control by several factors (age, income, ethnicity), then the sample size decreases dramatically, which makes our conclusions weaker. For example, the number of respondents in the Asian, 35-54 years of age, income higher than 35k, subgroup is of only 2,086. For a visual illustration of the so called “curse of dimensionality,” check this interactive web app.

Better growing old alone… if we have enough money?

Another factor that clearly affects “happiness” is the relationship status of individuals. BRFSS includes a Marital Status variable with several categories, but in order to keep the plots simple I restricted the visualization to Married, Divorced, Widowed, and Never Married. The eikosogram between General Health and Marital Status looks as follows:


The health levels are suspiciously high among the never married category. However, this plot was generated using the entire population sample, which covers all ages starting at 18. By differentiating between age groups and also gender we get a better representation of the change in health patterns among subpopulations with different marital status:


Some of the patterns are expected or known, for example health levels decrease as people age, and the fraction of married women up to 34 years of age is higher than that for men. In addition to that, the fraction of men in the 25-34 age bracket reporting excellent health among non-married individuals is higher (in fact, similar to those married) than for women. Is this a manifestation of the social pressure acting on women to get married before their mid-thirties? Again, we cannot draw these causal conclusions from the data, but at least we can use the visual patterns as a guide for more detailed analyses.

It is also not surprising to find that income levels having a strong correlation with health. But perhaps more interesting is to see how the association between health and marital status dramatically changes its direction when discriminating between high and low earners. When we aggregated all the data for people 55 years or older, we saw in the previous animation a marked decrease in health among individuals that ended up single, either due to divorce, death of partner, or simply by not getting married. But if we now restrict the analysis to people with income levels above $50,000, then there is no longer a decrease, specially among divorced individuals:



I think that these “non-rigorous” findings are a good illustration of the usefulness of exploratory data analysis as a first step. By quickly defining cross-sections of the data and controlling by multiple factors (always within the limits of what the sample size allows) we can use interactive visualization to guide our intuition and find new tentative hypothesis.

If you are interested in exploring the BRFSS and other similar datasets with Mirador, remember that the Mirador Data Competition is still open until next week, and you can win some prizes by submitting your findings!

Finally, over the past months I compiled a list of several publicly available datasets that I included in this public list. Feel free to add more links to the list.

October 21, 2014

Written by:


In the last month, we built a tool that explores the global seismic activity occurring over a single year. The project integrates earthquake, population, and mortality risk data so that users can explore how the frequency and magnitude of earthquakes generates varying levels of risk around the world. Visit the site:

Explore a year of earthquakes with the web tool.
Explore a year of earthquakes with the web tool.

I had always thought of earthquakes as dramatic, irregular events, when in fact there are tens of thousands of earthquakes each year that people cannot even feel. Back when I was in school, I made a print piece that looked at the number and magnitude of earthquakes occurring throughout a full year. As you can see in our more recent project, there were over 6,000 earthquakes globally in 2013 (and that’s only looking at those with a magnitude of 4.5+!). In fact, there are over a million earthquakes each year, but most are too small to notice.

A year of earthquakes
A year of earthquakes

Within the dataset pulled from the U.S. Geological Survey, we focused on the earthquakes’ magnitudes, locations, and dates of occurrence. The tool allows you to focus on a particular subset of earthquakes by limiting the timespan, range of magnitudes, and map extent. On the other hand, you can maximize the timeframe and magnitude range to get a picture of the whole year.

When comparing time intervals of the same length (for example 30 days), the ratio of magnitude 4.0-4.9 earthquakes to magnitude 5.5-5.9 to magnitude 6.0-6.5 earthquakes, etc. stays about the same. However, the total number of earthquakes can vary dramatically. A 31-day span starting in January had 449 earthquakes, while the 31-day span starting in February had 809.

The differences and similarities between the seismic activity of two 31 day spans.
The differences and similarities between the seismic activity of two 31-day spans.

The tool revealed hot spots of seismic activity. By including population density as a layer, we could begin to see areas of high risk. For example, Japan and much of East Asia are densely populated along the coasts — an area that is also host to a large percentage of the world’s earthquakes.

There is a great amount of seismic activity in East Asia.
There is a great amount of seismic activity in East Asia.

East Asia experiences some of the year’s largest earthquakes measuring over 7.0 on the Richter scale, putting many of these areas in the top mortality risk deciles.

Two 7.0+ earthquakes in 2013 happened near Japan.
Two 7.0+ earthquakes in 2013 happened near Japan.
The red sections show areas with the highest earthquake mortality risk.
The red sections show areas with the highest earthquake mortality risk.

The largest earthquake in 2013 actually occurred deep in the ocean off the coast of Russia. Luckily, its depth prevented massive damage, but the tremors could be felt thousands of miles away.

An earthquake in the Sea of Okhotsk measured 8.3 and was the largest earthquake in 2013.
An earthquake in the Sea of Okhotsk measured 8.3 and was the largest earthquake in 2013.
Its tremors could be felt thousands of kilometers away in cities as far as Atyrau, Kazakhstan and Moscow, Russia.
Its tremors could be felt thousands of miles away in cities as far as Atyrau, Kazakhstan and Moscow, Russia.

Areas like East Asia, the Himalayas and the U.S. West Coast experience a large amount of earthquakes because these areas lie directly on top of tectonic plate fault lines. On the other hand, areas like the U.S. East Coast, Australia and most of Africa experience little to no seismic activity throughout the year due to their locations in the centers of tectonic plates.

The white lines show the boundaries of the earth's tectonic plates. Most earthquakes occur along these fault lines.
The white lines show the boundaries of the earth’s tectonic plates. Most earthquakes occur along these fault lines.

October 16, 2014

Written by:


Daily diets vary considerably around the world—and the food we eat often mirrors the wider structural circumstances of the places we live in. Whether influenced by strained foreign relations, growing economies, fluctuating market prices, or shifting environmental conditions, the food we consume depends on where we live. What the World Eats, our latest piece for National Geographic’s Future of Food series, compares national diets and consumption patterns across a variety of countries over the last 50 years.

The caloric intake of the average person in 2011

The project breaks down the food items that fuel the daily diet of each country, and also shares a detailed view of national and per person meat intake. Adding the lens of meat consumption is important in that it sheds light on the larger agricultural, economic, and political systems in each nation. The project data was sourced from the Food and Agriculture Organization of the United Nations (FAO), which has collected a trove of global data on food production, consumption, trade, emissions, and other agricultural indicators.

We designed the information in two forms. The daily diets are represented by pie charts (or “donuts” as they’re now known around the office, and cited regularly to remind Terrence that he should bring in radial morning treats for the rest of us). The proportion of each food item (meat, dairy, produce, etc.) in the diet is represented by the amount of space it occupies in the circle. In developing countries, grains — which are often less expensive — make up a greater portion of the diet, whereas wealthier countries have more diverse breakdowns. Circle size reflects the average daily intake of calories or grams per person. Somalia, with the lowest per person calorie consumption in the world, has a chart that is half the area of the U.S. chart (where the average person consumes over twice the calories of the typical Somali).

In toggling between grams and calories, you can see that quantity of food consumption does not translate into caloric yields. For instance, over half of the typical Chinese diet is composed of produce, yet it accounts for only 15% of daily caloric intake.

The second section of the graphic, meat consumption, is composed of time series charts. Given the high cost and multitude of resources required to raise animals, national meat consumption is more susceptible than the overall diet to changing external circumstances.

The Gulf War had a drastic impact on the availability of meat in Kuwait from 1990 to 1991.

Raising animals for meat consumption is taxing to both agricultural and financial resources. Livestock-based food production accounts for about 20% of global greenhouse gas emissions. Further, raising animals for food demands far more water, feed, and land than it would otherwise require to eat crops directly (note, a single cow requires a lifespan’s worth of resources, whereas using a space for crop production can yield foodstuffs annually). To bring Thomas Malthus into the discussion, we have a limited quantity of natural resources needed to feed an exponentially increasing population. The average person today eats twice as much meat than 50 years ago. Yet eating meat — especially livestock– is an inefficient means of feeding the earth’s fast-growing population.

Often as countries acquire more wealth, the proportion of grains in the diet declines, and individuals are better able to diversify the contents of their plates with more expensive animal products like meat and dairy. Additionally, impacts of war, tense foreign relations, and even widespread religious practices are visible through a country’s meat consumption.


To this end the diet and meat consumption of more developed places like the U.K. have remained relatively unchanged, while the influx of China’s population and economy has led to unrivaled growth in both national and per person meat consumption.

Visit the site to explore the data, compare consumption across countries, and learn about the factors that influence the way people eat around the world.

October 13, 2014

Written by:


Just ran across this photo of Darcy Bowden, my high school “Production Art” teacher, during a brief visit to Fathom last summer. Her class was a two-hour studio that I was able to take both my junior and senior year—my first exposure to real graphic design exercises (creating black and white ink drawings of concepts like “contrast,” or making artifacts in the style of other eras of design, and so many others…) and gave me a chance to build a portfolio that helped me get into design school. I’d wanted to take the class ever since reading about it in the course catalog as an eighth grader picking out courses for my first year of high school.

Ms. B also kept Phillip Meggs’ History of Graphic Design (which at the time had a different—though still fairly atrocious—cover) checked out of the school library for the whole year, so I could read it from cover to cover. Such a great book, and perhaps a small thing, but huge for me to get that exposure as a seventeen-year-old. And she helped with the bigger things too—like introductions for internships and letters of recommendation for schools—but sometimes it’s the small things (whether the design exercises, a great group of people for class crits, or history books) that really stick with you.

So thanks to Ms. Bowden and the many other great mentors I’ve had over the years, and here’s to my friends who are teaching this fall and having the same kind of impact on their own students.

October 10, 2014

Written by:

Topics: , , ,

Today, in collaboration with Sarah Rinaldi, we released a video documenting Open India, an interactive visualization we developed for the World Bank Group. The video was showcased at the Annual Meetings of the International Monetary Fund (IMF) and the World Bank Group (WBG). The Annual Meetings bring together central bankers, ministers of finance and development, private sector executives, and academics to discuss issues of global concern, including the world economic outlook, poverty eradication, economic development, and aid effectiveness.

We interviewed key stake holders associated with the World Bank Group‘s Country Partnership Strategy with India (CPS), which develops transformational solutions aimed at ending extreme poverty and promoting shared prosperity. The video frames the context around the app, and reflects the humanity behind the data. The interviews address why visualizing and interacting with the information improves the transparency, accountability, and opportunities for growth and progress with India’s CPS.

“This app is to generate ideas, People will drive India forward, and will drive the rest of the world forward with their aspirations, with their ideas, and with the incredible potential of this country.”
Onno Ruhl, India Country Director, World Bank

The video will help the status of India’s CPS reach a larger audience at the 2014 Annual Meetings of the IMF and WBG.

It was a pleasure working with talented Sarah Rinaldi in documenting the project, and we look forward to future collaborations.

October 09, 2014

Written by:


Last Wednesday afternoon we noticed a massive traffic surge on the Fathom website, with all visitors loading one specific image of the All Streets project from 2008. How random! A quick glance at the referring page data showed us the cause: someone had come across All Streets and shared it on Reddit’s dataisbeautiful subreddit. And it was blowing up on the front page.

All Streets plots 240 million road segments in the United States.

For those unfamiliar with Reddit, it’s a massively popular site that aggregates online social and news material. People go there to share and comment on articles, pictures or videos. They share just about anything and everything they find interesting, thought-provoking, or funny. Whether you love Reddit, hate it, or simply find it a waste of time, the fact remains that when a link to a website lands on the front page, that site is going to receive an overwhelming amount of traffic that can bog down web servers or even take them offline—what some people call the Reddit Hug of Death.

As you can see from the below graph of our web traffic, we went from double-digit numbers of visitors per hour before the post, to a sustained peak of around 30,000 visitors per hour! Like most operations our size, this explosion in traffic caught us off-guard at first. Fortunately, we were able to do some fast AWS whispering and got back online in minutes. By the end of the rush, we had served over 250,000 visitors, and the project had been picked up by Gizmodo.

Web visitors weren’t just looking at the piece; they wanted copies for themselves. Along with the flood of page views, there was also a big bump in poster sales for All Streets and the related Dencity poster, with all proceeds going to charity. Of course, this created a (very good) problem of handling all the shipments. Terrence stepped up to lead a team of conscripted volunteers to roll, pack and mail the 130-odd posters over the course of a couple afternoons.

Mark and Varounny are ready to roll!
Only 107 posters to go…
Our mailman was thrilled.

Things have returned back to normal after the rush, and we’re dreading (hoping?) for another one. In the meantime, new hire Brian has taken to calling himself Fathom’s Ambassador to Reddit, and is cranking out Ben Fry memes at a somewhat disturbing pace.

No Ceilings: The Full Participation Project is committed to using data as comprehensive evidence to measure the status of gender equality around the world. Our latest video for the Clinton Global Initiative uses data to demonstrate the progress of women and girls since the UN World Conference on Women in 1995. While the video gives a high level summary of CGI-related topic areas, we found it important to share a more granular, interactive version of the findings that fed into the piece.


As the initiative is gathering data on the participation, completion, and performance of boys and girls in school, we looked at the indicator that preceded the rest of the success measures: access. In 1995, girls had less access to primary school than boys, and the disparity was most drastic in areas like Sub-Saharan Africa and South Asia. Over the last two decades though, net primary enrollment for girls has grown by 25% in both regions.

Download data (.csv)

The video features net enrollment ratios rather than gross enrollment ratios so that we can understand the proportion of students who are not in school who otherwise should be. Net ratios measure the number of students who are enrolled in school within pertinent age groups, while the latter gross values measure the total children enrolled regardless of age—meaning repeaters and students entering school at an early age can distort the actual disparities that exist between genders.

The map above displays secondary enrollment rates for girls, revealing darker (lower) rates for girls in places like Sub-Saharan Africa and South Asia. Built into a rotating globe in the presentation, countries populated with relevant time-series data throughout the course of the video. The globe acted as a constant seam that stitched together additional layers of country-level information.


Comparing income between genders is complicated because it often overlooks the larger structural factors that influence the imbalance. For instance, a disproportionately high number of women are employed in part-time labor as opposed to men, which affects the average monthly and annual salaries between genders. Women are also often employed in different occupations and sectors then men—so a gap in income may reflect the disparity of pay between jobs or economic sectors rather than a pay gap for men and women with the same position. All this being said, the contextual nuances probe a larger question: why don’t women have the same occupations as men?

Download data (.csv)

Even in OECD countries (generally considered to be developed nations), we see a disparity in average annual salaries between men and women. In Ireland, the OECD country with the most “equal” median annual wages, women still earn 3.5% less than men.


Unpaid labor

Another imbalance in the workforce is reflected through the amount of time spent in unpaid labor, that being defined as the number of minutes spent on routine domestic work, care for household members, care for non household members, volunteering, and other unpaid tasks. There is data available for a smattering of the OECD countries along with a few others, and in every case, women spent significantly more time on unpaid labor than men. The trend influences the larger gender disparities that exist in the workplace.

There are very few countries worldwide that actually report this data. In fact we have record of only 29 countries in full. Data represents the efforts of governments to report on the status of its citizens. The failure to collect and share information suggests either a shortage of resources to do so, a government’s lack of value for its citizens, or a hesitation to publish the reality of the results. The absence of data and transparency regarding gender disparities in the workforce points to the greater issue—why are there so few countries collecting and publishing gender disaggregated information on the labor force?

Download data (.csv)



For an accurate understanding of labor force participation rates (LFPR), it’s important to learn the context behind the numbers. The indicator below measures the share of men and women aged 15+ that are a part of the work force. Countries with the highest LFPR for women—like Afghanistan, Albania, and Algeria, where female participation rates exceed 85%— often reflect the lack of freedom and agency of women to choose an alternate path. Extraordinarily high LFPR rates suggest that women don’t have the liberty to complete secondary education, or to select a career with room for advancement.

At a global level and in the following countries, however, the longstanding gap between men and women’s participation in the workforce reflects the greater gender imbalance in economic participation.

Download data (.csv)

A recent study takes the gap in LFPR one step further, and measures the potential economic gains various countries would experience if they equalized employment between genders. While the available data only supports a story on national gains to GDP, the study also correlates women’s participation in the workforce to improvements in literacy rates, access to education, and infant mortality rates.

Download data (.csv)

At 34%, Egypt would experience the greatest percent growth in its GDP by equalizing LFPR. Ranking second, India would increase its GDP by 27%, yet because its economy is so much larger, it would undergo the greatest monetary increase of the countries involved in the study. We measured GDP in purchasing power parity as a metric to compare and normalize economic gains across countries over a single year, 2012.

Download data (.csv)

We filtered a tremendous amount of data down to a handful of high-level talking points, yet it’s important to understand the context, nuances, gaps, and limitations that inform the global stories. Keep an eye out for additional insights on the data. There will be more trends, subtleties, and correlations coming your way.

See video Gains and Gaps: No Ceilings Data Visualization

Today we are announcing the Mirador Data Competition, the goal of which is to make discoveries in large and complex public datasets. The good news is we have been developing a program to help you make these discoveries, it’s called Mirador.

The competition is from September 28th to October 28th, mark your calendars. During this time you can continue to upload findings to your user account from the app. Visit the competition page for complete instructions on how to get started.

The Sabeti Lab is offering cash prizes for the top three findings, which will be chosen by a jury of experts in the respective domains of each dataset.

The official Mirador Data Competition video, check it out below:

We have chosen four public datasets in the areas of health, sports, and global development:

Each one of these datasets is very rich in complex relationships between literally thousands of variables, and even though some of them have been extensively studied by specialists, there is more to be discovered. We also want to highlight the importance of open data as an enabler for transparency and public participation in research, governance, journalism, and economics, just to name a few areas. Please visit the competition website, create an account, and start exploring correlations to win cash prizes!

Last, but not least, we would like to thank the work of our summer interns at the Broad Institute, Mahan Nekoui, who implemented the user submission system, and Tom Silver, who created the intro video.

We’re thrilled to announce our latest project for the Clinton Foundation’s No Ceilings: The Full Participation Project. The video, released this morning at the Clinton Global Initiative (CGI), outlines advancements and setbacks of women and girls over the last twenty years, with particular focus on their access to education and economic participation.

The No Ceilings project is a collaborative effort led by the Clinton Foundation and the Gates Foundation, and it’s committed to using data to evaluate the advancements and challenges facing women and girls since the 1995 UN World Conference.

Our initial data exploration for this year’s CGI centered on educational progress and economic disparities for women and girls. Investing in equal education across genders has positive implications for the health of individuals, communities, and nations as a whole. Further, giving women equal access and participation to educational resources generates greater benefits for national economies. As stated at the conference this morning, “the value of sending your daughter to school is not rocket science.”

Global gap between boys and girls in primary and secondary school education

We’re excited about the project’s commitment to using data to give a comprehensive view of gender equality in the world today. In the words of Melinda Gates, “behind all of these data points are real lives.”

Stay tuned for more on our continued work with No Ceilings: The Full Participation Project.

See process post