Oscar {data} Party

Fathom is a team of diverse interests – and among our crew, there are several of us who are really into movies. During the weeks leading up to the 90th Academy Awards that aired this past Sunday, we looked through twenty years of data on Best Picture nominees, and have put together a collection of
some of the results.

Oscar data is interesting to work with because in some ways it’s highly organized and straightforward. These are the categories, these are the nominees, these are the winners. Here is what each film cost to produce, what its box office returns were, what its aggregate score is after being rated by thousands of people on IMDB. Yet if you watched the Oscars this weekend, you saw actors and cinematographers and writers and directors all attempt to articulate how profoundly these movies speak to the human condition. Each of these films are steeped in narrative – not only the plot of the film itself, but every part of the process where a spark of an idea was (over many years and with great labor) transformed into a feature film. Layers like these lend great depth to flat numbers.

I was particularly interested in production budgets – the range spans several hundred million dollars, but nearly all of the best picture winners from recent years have costed only around $20 million. The two outliers are Argo ($45 million) and Moonlight, which at $1.5 million, is the lowest production cost of any best picture winner in 90 years (taking inflation into account). 

Flowchart noting recurring themes among Best Picture nominees. Don’t worry, Tom Hanks will fix everything.

From the list of nominees, I started pulling films together by subject matter and found some interesting pairings. It seemed that certain types of films tended to run long. Certain wars feature more prominently than others, and movies about racism are frequently set within specific eras of the past, rather than present-day. The flowchart above maps some of these findings. It also turns out that the movie posters for Seabiscuit and War Horse share strikingly similar compositions.

Initial sketch and final motion version of visualization of film lengths 

It was also interesting to take a look at the top-grossing films of each year, and compare this set to the films that won Best Picture. The Return of the King is the only sequel in the last two decades to win best picture, whereas most of the top grossing films are many incarnations deep into their respective worlds. Captain America: Civil War is the 13th film set in that particular Marvel universe, and Star Wars: The Last Jedi and Harry Potter and the Deathly Hallows part II are the eighth installments in their respective franchises. Audiences are clearly willing to turn out in larger numbers for something they’re familiar with, and the ticket sales of the Best Picture winners look pretty diminutive by comparison.

2018 Best Picture nominees, with the number of nominations orbiting a central ellipse that correlates in size with production costs.

This group of studies includes interactive sketches, motion studies, digital studies, and cut paper. With the end of the awards comes the end of the project – for now – but there are countless stories and methods of representation for the data. 

She Should Run

In November 2016, we created First of Her Kind, a poster celebrating the 100th anniversary of the first woman to be elected to Congress and the first 94 women to be elected as Senators, Governors, and Representatives for each state.

Through organizations like She Should Run, we’ve seen an incredible increase in women running and being elected to office since the 2016 presidential election.

We’re honored to be included in this year’s She Should Run holiday gift guide. To support their mission of providing networks and resources for women to organize campaigns and run for office, we’ll be donating the proceeds of our poster to them.

You can read more about the inspiring women featured in First of Her Kind on our blog. We look forward to more women in office and to needing to update our poster as frequently as possible in the years to come.

Guest Lecturing at Harvard CS 171

Last week Mark and I had the opportunity to be guest lecturers at Harvard’s CS171 Data Visualization course. It’s always great to be able to get out of the office to talk with people about the work we’re doing.

The CS171 course focuses on data visualization theory and covering a lot of ground in terms of learning to program and the basics of interactive visualization. We like being able to guest lecture for this course because we always hear from students afterwards that they enjoy being able to see how what they’re learning can be used outside of the classroom. Afterwards, students still had some commonly asked questions, so we thought we could answer some of those here.

Where do you find data?

This is a question we get a lot, and unfortunately there isn’t an easy answer. With our client work, they’re generally the ones providing the data, but with our curiosity pieces, the data is often sourced ourselves or even self-generated. For Scaled in Miles, Mark was interested in Miles Davis’ career and found a database of all of his recording sessions and collaborators, which became the main data source for that project. For Rocky Morphology, James watched all of the Rocky movies and recorded – by hand! – what type of scene was going on at every 1 minute interval. (He did this without telling us, otherwise we probably would have built a simple tool!) The USGS and NASA are great resources if you’re interested in earth science data. The UN and World Bank provide lots of interesting global data (which is where most of the No Ceilings data comes from), and the U.S. Census is obviously a key resource for U.S. data. Don’t be discouraged if you’re having a hard time finding data you want to work with. Unless someone is providing you with data, it can take some digging to find an interesting data set to explore.

You don’t seem to use D3. What tools do you use?

For our initial explorations and desktop applications, Processing is our go-to tool. (Ben is still its primary developer, and many of its features exist because he wanted them for his own projects.) Once we’re done sketching with Processing, we might port to JavaScript (if it’s a web piece) or deploy the application to another environment. For instance, we might build an installation on here using mostly Macs, and then run it on a Windows machine for the actual installation.

For the web, we stay away from visualization libraries because our job is to provide bespoke ways of looking at data, which often makes using a library more of a hindrance than helpful. (Another way of thinking about it is that visualization libraries rarely solve the things that actually take the most time for us.) Primarily, we write our projects from scratch in JavaScript and include the typical UI libraries (JQuery/JQueryUI) when useful.

For analysis and data-wrangling, we often turn to Python. We also do a fair amount of server-side work with Python, sometimes Node, and even a little Java.

We try to use whatever tool best suited to the job, and makes us most efficient, often employing many different tools at various stages. The No Ceilings project, for instance, started with a custom-built tool created with Processing. That tool helped get a sense of what was in the quarter million rows of data, and once we found something interesting, that was exported to Excel, which was used to generate charts and short write-ups that were delivered as PDF documents. The final site uses a lot of JavaScript, but also D3 for a piece or two, and even three.js for a 3D globe for mobile devices.

What are the differences between using D3, p5.js, and Processing?

D3 is primarily a data visualization library. It is great for getting a quick view of your data and we’ve used it here for just that on many occasions.

Processing is a part language, part library, part environment, and often used for creative coding. The main version is Java-based (as a “language” it’s sort of a dialect of Java), so the work you create with it is not readily usable on the web. However, it has a lot more juice than your browser, so when you’re working with large datasets, it can make those initial iterations happen a lot faster. It also can export out to PDF, which makes it great for creating the data-driven pieces of any print work you may want to do.

p5.js is a project that builds on the ideas of Processing but rethinks some of its base decisions for the web. Primarily, it can help simplify the process of writing HTML5 canvas applications. It’s a great starting place for learning to code in general (it’s what’s used by Khan Academy, for instance), but it’s also helpful when you find yourself editing a D3 example so much that you’re writing more code to change it than the example itself. When you get to that level of customization, p5 and the HTML5 canvas can be your friend.

Are you hiring? What do you look for in a candidate?

Yes! We are always looking for curious, driven candidates. We are not looking for someone who can “do it all,” but people who can think creatively about how to apply their skills – whether that be coding, design, writing, or managing – to the work we do. We also have internships throughout the year. If you want to learn more, check out our careers page.

Related posts
MIT 4.s02: Information Design
Girls Who Code
MIT 4.s50: Information Design
Processing Community Day

Just a few weeks ago was the first ever Processing Community Day. As volunteers and attendees, we were lucky enough to be able to watch inspiring community talks, see new and old faces, and present our own work. In this post, Danielle and Olivia reflect on their experience.

After having a wonderful time volunteering at Scratch Day last year, Olivia suggested we should plan a similar event to celebrate the Processing community. Processing users have grown rapidly since its beginning in 2001, but there has never been an officially organized in person event. It was fitting then for the theme of the first inaugural Processing Community Day to be “convening for the first time.”

Organized by Taeyoon Choi, co-founder of the School for Poetic Computation in New York, over two hundred attendees met at the MIT Media Lab on October 21st to hear talks, demo projects, and participate in workshops by the Processing Foundation and Fellows, students, teachers, artists, and members of the Processing community. It was exciting to see people meet who shared an affection for Processing, and learn about the different ways it has been a part of their work – from engineering to art, from music to teaching.

One of our favorite talks was by Claire Kearney-Volpe and Mathura Govindarajan from NYU Ability Project. They spoke about their work using p5.js to create code “readers” and other programming tools for people who are visually impaired. Although p5/Processing is primarily a coding language for visuals, this work shows how it can be used in non-traditional applications as well. Because our work at Fathom focuses on accessibility through the means of visual presentation, it was a good reminder of how we might think about understanding data and information accessibility in other forms, and what those tools might look like.

Another one of our favorites was by Sharon De La Cruz, artist and educator at Princeton University. She spoke on taking ownership of feeling vulnerable in her work and art practice, in other words being comfortable being uncomfortable.

There were also tons of great cross-disciplinary speakers during the community lightning talks. Ari Melenciano is a multi-disciplinary artist who spoke about the importance of representation, and how she combines creative coding, music, and building her own musical instruments. Rosa Weinberg showed projects from her work at Nuvu studio and how she pushes students to think outside the traditional forms of engineering. Since we can’t highlight everyone, you should check out the post here to learn more about all the community speakers we had.

Another highlight was seeing demos of work in person that we had only seen online. Freeliner, a program made in Processing by Montreal-based artist Maxime Damecour, traces drawn lines and shapes with light projections in real-time, creating an interactive light installation on any blank surface with a marker and projector.

We’re thrilled to have seen many old and new faces at the first Processing Community Day, and we’re looking forward to the next one!

— Danielle and Olivia

Summer of Stats with Sunnie

This summer, I explored a wide range of data sets and put together notes to introduce information designers to useful concepts and terms in statistics.

Final Project Guidelines, or A Brief Walkthrough of the Information Design Process

At Fathom, as I understand, the information design process has roughly four stages: data collection, data cleaning, data analysis, and data representation. Depending on the project, one might be involved in a few parts of the process. Nonetheless it is important for one to understand the effort and expertise required at each stage.

The final project in Fathom’s MIT Information Design course has students work through all of the stages to produce their own data-driven piece. To make sure all the important points and questions are covered, I created Final Project Guidelines, or A Brief Walkthrough of the Information Design Process and did a case study with the so-called “tampon tax” to demonstrate how these guidelines can be used.

I collected data, mostly from articles, on how much sales tax is placed on feminine hygiene products such as tampons and sanitary napkins. I organized the data into a table and added notes if there were any interesting facts or stories regarding each state’s tax rate. Not much cleaning or analysis was needed because the piece was rather straightforward. For the last step of the process, data representation, I created a heat map with R’s ggplot2 package to show at a glance how much sales tax each state places on female hygiene products.

To date, 12 states don’t charge tax on feminine hygiene products. 5 states (AL, DE, MT, NH, OR) don’t have sales tax to begin with, and 7 states (IL, MD, MA, MN, NJ, NY, PA) lifted the tax between 1975 and 2016. 2 states (CT, FL) passed bills to eliminate the tampon tax, but they don’t go into effect until July 2018 and January 2018, respectively.

1hr Introduction to Statistics: What should information designers think about when they first see data?

I also put together material for an hour-long class, titled 1hr Introduction to Statistics: What should information designers think about when they first see data?, and shared it with the team during two sessions for practice and feedback.

In creating the material, I first brainstormed a list of topics that I believe essential and most useful in exploratory data analysis, referencing several textbooks on statistics and data analysis. Then I searched for quality data sets with good documentation to walk through the selected topics. The data sets I settled on are as follows: Boston Airbnb, world GDP and population, Massachusetts crime estimates, significant earthquakes in the world, national wage estimates, Boston street names, popular baby names in the US, and global temperature changes. Most of them have comprehensive metadata and come from government or research institutes or data centers. Using R, I cleaned and analyzed all of the data sets, produced plots, and created a set of material.

Below are some of the findings looking at Fathom members’ names in the baby names data:

New York Philharmonic Orchestra Performance Data

For the last stats session, I explored the New York Philharmonic Orchestra’s performance history database. Since Charlie was working on the next episode of Especially Big Data on the New York Philharmonic’s Open Data movement, I thought it would be fun to dig into the data together as a team.

The performance history database is massive and has detailed information on more than 20,000 performances since the birth of the orchestra in 1842. It took me a while to navigate through the data, and I had to frequently visit Wikipedia and other websites to read about the orchestra’s history. Below are some plots that show certain trends in the orchestra:

In addition to creating stats material, I spent some time learning p5.js through Getting Started with p5.js and The Coding Train, updating the First of Her Kind poster, and reading through Ben’s massive book collection with topics ranging from data mining to graphic design. I also got to meet wonderfully energetic Girls Who Code, eat the best homemade key lime pie, and see the eclipse with pinhole projectors made out of paper and granola boxes. Seven weeks was a quick sprint, but nonetheless full of exciting work and fun memories. Thank you Fathom for the amazing summer!

Founded in 2010 by Ben Fry, Fathom Information Design works with clients to understand complex data through interactive tools and software for mobile devices, the web, and large format installations. Out of its studio in Boston, Fathom partners with Fortune 500s and non-profit organizations across sectors, including health care, education, financial services, media, technology, and consumer products.

How can we help? hello@fathom.info.