If you’re in New York City between now and May 10, stop by the TriBeCa art space, apexart to see the latest Coding the Body exhibit. Organized by friend and former M.I.T. Media Lab professor Leah Buechley, “Coding the Body interrogates the relationship between human and code. It explores how code is being used to understand, control, decorate, and replicate us.” At a time when our lives are increasingly defined by codes—whether written by genetics, religion, or software—Leah’s exhibit explores the fascinating, enchanting, and occasionally unnerving relationships that develop between humans and code.

When Leah first came to Ben in search of a few contributions, we were (coincidentally) pretty deep in exploring human movement patterns through Nike+ FuelBand data. We contributed a collection of 2013 Year in NikeFuel posters, which visualize physical activity patterns for individuals over the course of a year, in addition to sharing the Dencity map. The map illustrates the world’s most densely populated places by representing people with circles of varying sizes and hues. Ben also contributed his Chromosome 21 piece, which captures 13 million letters—just a quarter of the 50 million letters—of genetic code from chromosome 21.

For more information on the exhibit and other nearby galleries, check out the latest review from The New York Times. They’ve conveniently outlined the locations of a perfect Saturday afternoon, if you ask us.

The exhibit features exciting work from Cait and CaseyReas among other friends. Take a peek if you can, and if your moral code so advises.

When Robin Hood first came to us with the intent of visually representing poverty and disadvantage in New York City, we were fascinated to learn that our understanding of poverty was based off of an outdated and inaccurate federal measurement. The Poverty Tracker, Robin Hood’s latest initiative, measures financial poverty, material hardship, and health challenges to give a more accurate depiction of what it means to be poor in New York City. In the analysis, construction, and design of this project, we felt it particularly important to remind ourselves throughout the process that we were looking at people—not numbers.

The Official Poverty Measure (OPM) was constructed in the early 1960s, and does not account for the changing allocation of expenses (Americans today spend more money on housing, for instance), the geographic variation in cost of living, or the range of income sources families use to meet their daily needs. In short, the earning and spending patterns of the typical American family have changed dramatically in the last 50 years, and the poverty line hasn’t kept up.

The Census Bureau recently developed a revised measure of poverty, the Supplemental Poverty Measure (SPM), which accounts for the array of goods purchased by the modern family (food, clothing, shelter, utilities, etc.), the variation in cost of living across the country, and the range in pre and post-tax sources of income. The revised poverty line is $32,516 for a family of four in New York City. Using the more recent SPM, nearly one in four people are living in poverty in New York City, and even that doesn’t capture the total number of people who suffer from different forms of disadvantage like severe material hardships and health challenges.

The numbers in the Poverty Tracker emerged from a survey developed in partnership by Columbia University’s Population Research Center and Robin Hood, which gathered information from about 2,300 residents from the five boroughs, many of whom were living in poverty. As we dug further into the survey data, it became increasingly clear why the Census Bureau created a new metric: poverty and distress in New York City affects significantly more people than the official measure suggests.

We quickly learned that different demographic groups are disproportionately affected by poverty, severe material hardship, and severe health challenges, and not always how you might expect. More than two million New Yorkers faced severe food, financial, utilities, housing, or medical hardships—regardless of whether they’ve received a full college education, have a family with children, or even earn incomes above the poverty line. By examining the data in different cohorts– those being income, age, race, gender, education level, and household structure– we were able to convey which groups suffered the most from different types of disadvantage. It was eye-opening to look at some of the specific questions brought up in the survey, so we incorporated some of them into the site to give people a more tangible idea of what it means to be living in poverty in New York City.

After we had figured out which slices of the data to include in the Poverty Tracker, we explored various design representations and layouts to give the numbers a more human feel.

Jose Luis experimented with different grid densities to see if we could better embody the number of affected New Yorkers, but we found in the end that larger dots were the most readable (we came to this realization only after Jose had been seeing dots for days).

We decided to use color to define groups both within and between the cohorts, making it clear that each toggle revealed a different narrative for each demographic. As users scroll down the site, we designed the 10 x 10 grid to remain static so that no matter how the data is divided, the user can see different breakdowns of the same pool of people (New York City residents) throughout the entire narration of the site.

We were glad of the opportunity to communicate the realities of poverty in New York City, a place sometimes more associated with advantage and economic prosperity. In the US, we have a tendency to focus on aiding the poor in faraway countries, at the expense of ignoring the disadvantaged in our own backyards.

My friend Wombi asked me if I wanted to go ice fishing, so naturally I said yes. The last time I went on a weekend trip with him, I wound up swimming in the Charles River with a lifejacket and a Pabst Blue Ribbon. However this time the stakes were higher, five figures higher.

We planned to compete in the 35th Annual Meredith Fishing Derby, hosted by the Meredith Rotary Club in Meredith, New Hampshire. The main festivities took place in Meredith Bay which is on Lake Winnipesaukee, New Hampshire’s largest lake. The lake spans 71 square miles, and has a maximum depth of 212 feet (think huge lake trout).

$900 – Insulated shelter (a.k.a. Moonbase Alpha): tape + insulation + materials + contingency for remaining parts

$300 – Five Permits: fishing licenses + derby fees (two holes per license allowed us 10 holes at all times)

$300 – Provisions: $200 beer + $100 food

$1800 – total divided by 18 people so $100 per person

Forecast

Outdoors: highs in the upper 20°s with lows reaching 8°

Moonbase Alpha: 80° if and when the fire was well-tended

Moral of the Story

It was Saturday evening, the sun was just going down, and we only had one nibble on our lines. But that’s when it hit, it wasn’t all about the Benjamins.

It was about meeting the locals…

…and watching this hover craft that couldn’t steer well.

It was about imagining what my parents wore in the 80s…

…and taking core samples.

It was definitely not about fishing or drinking…

…but may have been about doing both at the same time.

It was a surreal experience being at Moonbase Alpha. Considering we were at a competition, there was a positive and friendly atmosphere out on the ice. Thanks to Meredith, NH, for its hospitality; and thank you Wombi, friends from the Webb Institute and MIT, and everyone else who put their time towards making Moonbase Alpha possible.

As I mentioned in my previous post, our collaboration with the Sabeti Lab is aimed at creating new visual exploration tools to help researchers, doctors, and clinicians discover patterns and associations in large health and epidemiological datasets. These tools will be the first step in a hypothesis-generation process, combining intuition from expert users with visualization techniques and automated algorithms, allowing users to quickly test hypothesis that are “suggested” by the data itself. Researchers and doctors have a deep familiarity with their data and often can tell immediately when a new pattern is potentially interesting or simply the result of noise. Visualization techniques will help articulate their knowledge to a wider audience. This time around I will describe a quantitative measure of statistical independence called mutual information, which is used to rank associations in the data.

In the last post, I went into some detail about the difficulties that arise when representing pairwise associations in a dataset that contains a mixture of numerical and categorical variables. This is often the case for health and disease data, where researchers are interested in finding relationships between demographic parameters (age, gender, income, etc.), prevalence of specific illnesses (expressed as the percentage of population affected by it), various quantitative lab measurements (such as hormone levels, blood cell counts), and genetic markers (for example Single-Nucleotide Polymorphisms or SNPs). I described a visual representation of pairwise associations called eikosograms, which are very effective at depicting statistical dependency between two variables. However, as our datasets could contain up to several thousands of variables (as is the case with NHANES), the number of pairwise plots would be in the order of millions. For an exhaustive examination of pairwise relationships, eikosograms are best suited for smaller datasets comprised of a few variables.

At the same time, most of these plots are of little interest since only a small fraction of them are likely to represent related pairs of variables. Our visualization tools need some kind of numerical index or score of statistical dependency, which can be evaluated automatically by the computer to show the user those with the highest significance. These plots should reveal (if our score of dependency is good) the most interesting associations in addition to exposing new ones. The user can then focus her attention on the highest ranking associations, and use visual representations such as scatter plots or eikosograms to determine their nature, and whether or not they are worth looking further into with more specialized statistical software like R, Stata, or SUDAAN.

The mutual information quantity I mentioned earlier seemed like a good starting point for creating a general index, or score of statistical dependency. In order to see why, we need to go step by step and first look at a fundamental concept from information theory called the Shannon entropy. This concept was introduced by American mathematician Claude E. Shannon in a paper from 1948 entitled, “A Mathematical Theory of Communication.” This paper was the starting point of the entire field of Information Theory and had major repercussions in understanding the limits of digital data transmission and compression.

There are many online materials on information theory and Shannon entropy, starting with the obligatory Wikipedia article. However, these references can get very mathematical and abstract fairly quickly. What I would like to do here is to write down my own “self-conversation” that developed while I read about and tried to understand Shannon entropy and mutual information better, with some interactive plots at the end to help visualize a few fundamental relationships in information theory.

One thing I found striking about the Shannon entropy is that one can describe it with very intuitive words, and these words actually do help make sense of the definitions and relationship without having to rely too heavily on mathematical notation. The simplest “textual” definition of Shannon entropy I found was: the amount of information we gain when making some measurement,” or alternatively, the “amount of surprise we should feel upon reading the result of the measurement” (as suggested by Andrew Fraser and Harry Swinney in a physics paper on strange attractors from 1986).

The Shannon entropy makes the intuitive concepts of “information” and (perhaps less seriously but more effectively) “surprise” precise and measurable. Concretely, if a measurement x has a probability p(x) of actually occurring, Shannon defined -log p(x) as the amount of information we gain from observing x, where log is the natural (base e) logarithm function. We could use a logarithm in any other base, but that simply represents a change in the unit of measure. The logarithm function matches our intuition well: if an outcome x has probability p(x) = 1 of occurring – which means it always happens – we won’t gain any information (or feel any surprise) from observing it, and accordingly to this intuition, log 1 = 0. Conversely, an outcome x with a low chance of occurring, say one in a thousand times or p(x) = 1/1000, will surprise us if we see it happening, consistent with the logarithm formula

-log 1/1000 = log 1000 = 6.9

If the outcome is even less likely, for example once every million or p(x) = 1/1000000, then its observation would surprise us even more:

-log 1/1000000 = log 1000000 = 13.81

Because of the logarithm function, the information increases two-fold when the probability decreases 1000x.

If we held a survey to record the gender of randomly selected individuals, we would have p(female) = p(male) = 1/2, and the amount of information (or surprise) is

-log 1/2 = log 2 = 0.69

Using logarithm in base 2, we would gain exactly 1 bit of information by knowing the gender of a randomly selected person, since ln 2 = 1. From Shannon’s original 1948 paper:

The logarithmic measure is more convenient for various reasons:
1. It is practically more useful. Parameters of engineering importance such as time, bandwidth, number of relays, etc., tend to vary linearly with the logarithm of the number of possibilities [...]
2. It is nearer to our intuitive feeling as to the proper measure [...] One feels, for example, that two punched cards should have twice the capacity of one for information storage, and two identical channels twice the capacity of one for transmitting information.
3. It is mathematically more suitable. Many of the limiting operations are simple in terms of the logarithm but would require clumsy restatement in terms of the number of possibilities.

But we still don’t have the Shannon entropy. From the individual -log p(x) contributions for each possible outcome x=a, b, …– for example, in the case of X = gender then a = female, b = male– the total Shannon entropy is the expected information we would gain from making a measure on the variable X, meaning that each log p(x) term is weighted by the probability p(x):

You can drag the edge of the circles to change the magnitudes of the probabilities, and this can help us get a sense of the amount of information can we obtain from a variable X.

In the equation above, we have only two possible generic outcomes a and b, and by changing the magnitudes of p(a) and p(b), we can see how H(X) changes accordingly. As it turns out, the maximum value of H(X) happens when each event is equally likely. A very unlikely event, however, makes a large -log p(x) contribution that’s weighted down by the low probability p(x). So in the average, it is better to have uniform distribution over the outcomes in order to maximize the information content.

This definition can be applied to a measurement where we observe two variables X and Y at once. This case is similar to the previous, however we are now looking at joint probabilities p(x, y):

Now we have four probabilities, p(a, c), p(a, d), p(b, c), p(b, d). Like the previous equation, we can manipulate their values by dragging the circles.

We are finally getting closer to the mutual information! If we think about the meanings of H(X), H(Y), and H(X, Y), we have, respectively, the average information we gain from measuring X and Y alone, and the average information of measuring X and Y together. Since we should always gain more information from measuring two different variables than from measuring only one, then regardless of the case:

And what do we obtain if we subtract H(Y) from H(X, Y)? In H(X, Y) we have the joint information from observing X and Y together, so by subtracting H(Y) we we would have removed the information that is due solely to Y. This is called the conditional entropy of X given Y. The conditional entropy of Y given X is defined similarly:

If the variables X and Y are unrelated to each other, we could expect the conditional entropies equal the entropies we first defined (typically called marginal entropies):

But these two equalities are equivalent to having the joint entropy H(X, Y) equating to the sum of the two separate marginal entropies:

If the variables are related we cannot split the joint entropy in this way. If we continue with the survey example, and we measure X = age and Y = education level –which are clearly related– observing an individual of age over 20 will result in less “surprise” to additionally learn that the person has college education. If we measure X and Y separately, some of the information in H(X) + H(Y) would be counted twice, so H(X, Y) < H(X) + H(Y). It sounds reasonable then to define the difference between H(X, Y) and H(X) + H(Y) as the information that is “shared” between X and Y, this is, the mutual information of X and Y:

If X and Y are unrelated, there is no shared information between X and Y (meaning that knowing one doesn’t help to predict the other) and I(X, Y) = 0.

We could use the mutual information I(X, Y) as our measure of statistical dependency: starting at 0 when X and Y are independent, and increasing as the level of dependency between them grows. However, it misses a nice property: normalization. In principle, the mutual information can be as large as H(X, Y), since it represents the part of the joint entropy that is common to both X and Y. From our interactive equations, we can find combinations of probabilities that result in H(X, Y) larger than 1. It can be much larger than 1 if we have variables X and Y with many possible outcomes. Hence, the following calculation gives a number between 0 and 1 that we can take as our similarity score for X and Y:

How can we make all these relationships a bit easier to visualize? We already discussed the eikosogram plot where statistical independency is visually encoded by the horizontal pattern indicating the conditional probabilities are independent of x, this is p(y|x) = p(x). Below we have an interactive eikosogram (which is linked to all the plots and equations we’ve had so far):

By increasing the step in the eikosogram plot, we can make the dependency between X and Y more pronounced. Alternatively, we can make the variables independent. And how do all the entropies and mutual information quantities change as we do that? The next graph shows the overlap between H(X) and H(Y) becoming larger or smaller depending on our probability selections:

As with the previous post, I’m using this space in the Fathom blog to explore better ways of illustrating or visualizing concepts in statistics and math by combining interactivity and “traditional” text. A good example of this approach is the visual explanation of conditional probability by Victor Powell. Another important reference for the idea of making mathematical arguments more interactive and visual –using the web as the publishing and sharing platform– is the IPython notebook, which combines text, math, plots and calculations in a single webpage, like in this notebook about simulations of economic marketplaces.

For the interactive snippets in this post, I used p5.js, an experimental version of Processing designed for use with JavaScript. The last plot showing the relationship between all the information theory quantities is based on figure 2 from this article by Rethnakaran Pulikkoonattu.

Today is March 20, the official first day of spring, and it almost feels like it. This winter in New England has seemed like one of the toughest in years. The cold and snow have been absolutely relentless, and I think everyone is sick of hearing about the Polar Vortex.

In a quest to find some signs of spring, I took a walk yesterday after work to see if there were any flowers or trees budding — I came up completely empty.

I tried again today at lunch. I knew I needed to look a lot closer if I was going to find anything, so I took out a Canon EF-S 60mm f/2.8 Macro USM Macro lens. I finally found some life budding up, but it wasn’t easy.

In continuation of a post we did a while back, It’s too gray outside, I decided to see if I could extract some interesting color palettes for inspiration. Using some sample code from generative-gestaltung.de, I modified the Processing sketch that breaks down color by hue, saturation, and brightness, and applied it to the following images.

Sample 1: flora unknown; no color sort

Sample 2: flora unknown; sort by hue

Sample 3: flora unknown; sort by saturation

Sample 4: flora unknown; no color sort

Sample 5: flora unknown; sort by hue

Sample 6: flora unknown; sort by brightness

Here’s to spring and uncovering life, color and data samples which are all around us.

Data, in its multiple forms, can range from the very abstract to the most tangible. We tend to be type-agnostic, but recently a particularly clear set of data caught our eye: real-time position tracking for sports events.

Technical development has brought physical tracking of sports events to a high degree of reliability, enabling real-time data collection and processing for statistical and analytical purposes, in addition to enhancing the spectators’ experience of the event through integration with the broadcasting system. There are currently many examples in football, soccer and even NASCAR racing.

Since 2010, the NBA has embraced these technologies, and recently announced a major agreement to implement live player tracking for all 30 teams in the league. The idea is actually rather simple: by populating the venue with sets of cameras and running computer-vision algorithms on the footage, the system is able to produce sets of X, Y, Z positions for each player and the ball for every frame, in this case 25 times per second. The system then couples that positioning with actual game event markings.

What? Our favorite sport coupled with lots of data? We had to give this a try!

Back in 2011 we got our hands on one of these sets of data, thanks to Brian Kopp, for the Oklahoma City Thunder vs. San Antonio Spurs game of February 23, 2011. Using Processing, we built a sketch to parse through the game’s data, and help us dynamically navigate through the gameplay and statistics.

Beyond letting us ‘watch’ the game over and over, the tool gave us the opportunity to delve a bit deeper into players’ behaviors and patterns. We were particularly interested in figuring out how each player’s intervention affected the game, whether positively or negatively.

In the game Antonio McDyess was not a starter, scored only six points, and wasn’t even mentioned in the game’s recap. But as it turns out, his presence on the court was crucial for San Antonio’s big score turnaround, and his leave at 8:50 of the second period was closely followed by Oklahoma’s momentary comeback. Whether these events are related or not, we’ll leave to your own judgment.

Furthermore, we wanted to explore game behavior related to the vast collection of location data collected for this game. Accumulating the ball movement throughout the complete game revealed interesting patterns, such as movement concentration around the 3-point mark, preferred shooting spots, or that players tend to transition into offense along the sides of the court.

Recursively stacking players’ movement over the game also shows some relevant patterns emerging, exposing each player’s recurrent locations, standings, and alignments according to their team position.

Stay tuned for next iterations in the third dimension. Until then, here’s a very interesting article on the subject matter.

This is a process post about 2013 Year in NikeFuel, but it also serves as an example of our collaborative process at Fathom. You can read the announcement of the project here.

The FuelBand is one of a couple Nike+ products where NikeFuel is earned by moving—whether it be running, walking, biking or dancing. The FuelBand stores minute-to-minute data, so with 1440 minutes in a day and 365 days in a year, we had to make a number of design choices to paint a clear, unique, and beautiful picture of every user.

Before I started designing this poster, Ben had written a Processing sketch to plot his daily fuel from 12am to 12am. When Ben traveled to NYC for the announcement of the FuelBand SE, James used that sketch to create a narrative of the trip.

We played with the idea of using gradient to distinguish multiple layers of lines, each of which represented a single day. The Nike+ branding guide contains a gradient shift from red to yellow to green as users approach their daily goals.

Instead of adjusting the gradient from left to right, we rotated it 90 degrees to highlight the best moments—or peaks—of the day in green. This idea eventually made its way to the vertical color shift used in the final poster.

I developed the code with some help from Ben and Mark so that I could try out different ways to visualize the minute-to-minute NikeFuel data. Processing allowed me to iterate on the design more naturally while staying true to the dataset. The following snapshots capture some decisions I made throughout the design process.

With 1440 minutes in a day, drawing a point at each minute would have been too granular a level of information. We sampled different groupings from one to 120 minutes, and eventually decided 15-minute buckets worked best. We tend to talk about and schedule our time in 15-minute increments, and the grouping allowed for a detailed representation that left out unnecessary chatter.

1 minute

5 minute

15 minute

30 minute

60 minute

120 minute

As the sketch developed, I constantly tested changes on all of our datasets in order to highlight how each user is unique. When Katy and Teri started referring to it as a #dataselfie I knew something was working.

Login to your Year in NikeFuel to see if you caught fire in 2013. Download a PDF for printing, show it off in the public gallery, and share it with your friends. You can also put it on your hearth to stay warm this winter.

Most of our team at Fathom began wearing Nike+ FuelBands last January. By the end of the year, we had accumulated enough data to start creating interesting code-based sketches of our activity. What emerged from each person’s data was a visually unique and telling story of their daily and weekly activity trends. As we fine-tuned the code to encapsulate more users’ information, it became increasingly clear that the portraits represented individualized routines, behaviors, and lifestyles. The exploration evolved into a detailed poster that depicted exercise trends, work routines, and even implications of sleep patterns. Today, those original sketches have graduated into the2013 Year in NikeFuel, a site built for Nike that allows the entire FuelBand community to see a unique portrait of their personal activity.

We’ve been working on a series of projects for Nike throughout 2013 to understand the similarities and differences that define the movement of the Nike+ FuelBand community. Studying minute-by-minute activity of users around the world has been fascinating, and hopefully we’ll be able to share more of those results at some point. While Nike’s dataset is incredibly vast, we saw highly personal narratives begin to emerge. After wrapping our other projects, we began exploring our own individual data.

We used Nike’s public developer APIs to start poring over our own information, as well as that of our families and friends who could be convinced to share their activity data. We experimented with representation through a series of Processing sketches that were passed around the studio. This helped us confirm the various kinds of patterns we saw in the anonymous data, but also to identify how the working dad, the mountaineer, the gym-rat, and the city slicker all have distinct patterns, routines, and lifestyles. Different accomplishments, interactions, events, and even prototypical days emerged in the presentation of our data. We were also struck by how people responded to seeing an image of their activity, and were able to use it to describe their daily routine, a standout week, or staying up all night with the arrival of a new baby. The cadence and intensity of how each person moves is incredibly personal, yet we rarely have the opportunity to see those patterns for ourselves. What began as a series of indiscernible daily plots soon shaped into a reflection, or a portrait, of our own personal movement for the last year.

As 2013 wrapped up, we shared a poster we’d created with Jenny Campbell at Nike, who set in motion the idea of implementing this across the entire FuelBand community. And so, lo and behold, today we launched a platform to help Nike+ users commemorate the individual stories of their own year in NikeFuel.

The top of the image is meant to be striking but unique, and as a poster, suitable for viewing on the wall and at a distance: it should be evocative but also true to the data. For a closer look, the area at the bottom focuses on breaking down the actual numbers and the details of movement as recorded by the FuelBand.

Each day is represented by filled yet transparent shapes, so the places on the poster with higher concentrations of color represent more ingrained behavioral patterns. While the fire-like layers on the top portion of the poster depict the regular intensity and activity levels of each individual, the bottom section of the poster aggregates various metrics to give a more tangible summary of each person’s movement. In addition, the posters relate individual activity levels to the U.S. Department of Health and Human Services’ Physical Activity Guidelines, so users are provided information that translates the NikeFuel metric to the context of weekly exercise.

In order to scale the project up to the Nike+ community, we created YearInNikeFuel.com, where Nike+ FuelBand users can view their personalized NikeFuel poster, and download a PDF for print. They can also share their activity trends with the Nike+ community gallery and with their social networks. The gallery allows users to see how they stack up against the rest of their peers, and illustrates how uniquely personal each individual’s regular activity is.

We’re pleased to announce that we’ve been awarded a grant from the Knight Foundation! After working with Knight to navigate the civic tech landscape, we thought they would be a great resource to help us explore a more personal project. The Knight Prototype Fund helps media makers, technologists, and tinkerers take ideas from concept to demo, so it seemed the perfect starting point to bring our latest project on urban agriculture to life.

It all started when Terrence found a tweet from the former Boston Mayor, Tom Menino, advocating for the expansion of urban agriculture in Boston. The recently passed Article 89 permits various urban farming initiatives for commercial purposes within Boston city zoning code. Considering the number of gardening, farming, and local food enthusiasts we have in the office, finding a way to visually clarify and interact with the legislation of Article 89 posed a wonderful opportunity to get more involved with our local Boston community.

With help from the Boston Redevelopment Authority, the Mayor’s Office of Food Initiatives, a number of local food policy organizations, and of course, the Knight Foundation, we’ll be creating a tool that allows users to locate and identify the different urban farming activities that are permitted around the city. Whether it’s a ground level or rooftop farm, hydroponics, aquaponics, beekeeping, or hen keeping, the tool will provide a simple way for people to visualize and navigate otherwise complex legislative zoning code for urban agriculture activities.

Following the acceptance of our project, Knight held a workshop with our round of grantees at the Luma Institute. In addition to learning about all of the other exciting projects underway, we explored different practices –like stakeholder mapping, creative matrices, and affinity clustering– to better test our ideas, and make sure we were asking the right questions to the right audience.

Stakeholder mapping

An importance-difficulty matrix to understand the viability and impact of different project ideas

So many ideas…

Thinking of getting these rounded whiteboard walls in our office…

We’re thrilled to have the support of the Knight Foundation, and we’re looking forward to continuing our work with the local food scene in Boston. Stay tuned for more progress on the project!

It should come as no surprise that we spend a lot of time geeking out over data. Unless we’re busy watching movies, you’ll find us exploring existing datasets, and working towards a clear and compelling visual representation of the stories we find inside. Reimagining the child health record as a part of last year’s Records for Life contest offered an exciting opportunity to apply those same design concepts to the input mechanisms themselves — both digital and analog — in order to increase the volume and accuracy of global health data.

The weight-for-age chart, for example, was not a required component of the contest. We were drawn to it for its ability to engage parents in the immunization process, making them more likely to both participate in — and complete — a vaccination program for their child. Attrition is a big problem in developing nations. In one study only 39% of the children surveyed completed their immunization, if they started at all. We saw the weight-for-age chart as a key way to convert parents into participants in the medical process. A parent who contributes to the vaccination record is a parent who takes care of the card, knows when their child’s next appointment is, and brings the card with them. They also have it readily available should a national health surveyor pay them a visit.

We started by examining existing records, which are often dense, confusing, and assume a level of literacy that some populations just don’t have. Plotting single dots on top of a full color document can also make future scanning and digitization efforts less accurate.

While the visual design might vary widely, these records are built on top of a rigorous dataset provided by the World Health Organization. The dataset contains daily median weight for boys and girls from birth to five years, accurate to four decimal points, as well as four z-scores above and below median.

The chart above demonstrates an instance where the needs of the parent may be different from those of a doctor or a policymaker. From a medical or research perspective, data gathered at this level of detail is essential. To a parent, however, the difference between a child weighing 15 kilograms and 15.1 kilograms is negligible. Their question is more fundamental: “Is my child healthy?”

We made a conscious choice to simplify the standard weight-for-age chart, enabling parents to use it as a tool to answer that one basic question. Our challenge lay in deciding what range to highlight, and how to calculate it. Since falling exactly inside or outside a certain range isn’t an absolute indication of health, and just getting close to the edge should be cause for alarm, we used an average of male and female values at two z-scores above and below the median to define “normal” — a range that statistically captures about 95% of the population. This allowed us to create a chart that worked for both genders, saving valuable real estate and cutting large-scale production costs. We also decided to use a single knockout color that can be filtered out easily by optical character recognition (OCR) software.

Asking parents to fill in an exact square for each month proved to be tedious and error prone. We ultimately shifted to values designed to be circled, with structured inputs for more exact decimal weights only at key milestones. Circles only appear around the values in the “normal” range, highlighting when a parent should be concerned.