F
k
Guest Lecturing at Harvard CS 171
Last week Mark and I had the opportunity to be guest lecturers at Harvard's CS171 Data Visualization course. It's always great to be able to get out of the office to talk with people about the work we're doing.

The CS171 course focuses on data visualization theory and covering a lot of ground in terms of learning to program and the basics of interactive visualization. We like being able to guest lecture for this course because we always hear from students afterwards that they enjoy being able to see how what they're learning can be used outside of the classroom. Afterwards, students still had some commonly asked questions, so we thought we could answer some of those here.

Where do you find data?

This is a question we get a lot, and unfortunately there isn't an easy answer. With our client work, they're generally the ones providing the data, but with our curiosity pieces, the data is often sourced ourselves or even self-generated. For Scaled in Miles, Mark was interested in Miles Davis' career and found a database of all of his recording sessions and collaborators, which became the main data source for that project. For Rocky Morphology, James watched all of the Rocky movies and recorded – by hand! – what type of scene was going on at every 1 minute interval. (He did this without telling us, otherwise we probably would have built a simple tool!) The USGS and NASA are great resources if you're interested in earth science data. The UN and World Bank provide lots of interesting global data (which is where most of the No Ceilings data comes from), and the U.S. Census is obviously a key resource for U.S. data. Don't be discouraged if you're having a hard time finding data you want to work with. Unless someone is providing you with data, it can take some digging to find an interesting data set to explore.

You don't seem to use D3. What tools do you use?

For our initial explorations and desktop applications, Processing is our go-to tool. (Ben is still its primary developer, and many of its features exist because he wanted them for his own projects.) Once we're done sketching with Processing, we might port to JavaScript (if it's a web piece) or deploy the application to another environment. For instance, we might build an installation on here using mostly Macs, and then run it on a Windows machine for the actual installation.

For the web, we stay away from visualization libraries because our job is to provide bespoke ways of looking at data, which often makes using a library more of a hindrance than helpful. (Another way of thinking about it is that visualization libraries rarely solve the things that actually take the most time for us.) Primarily, we write our projects from scratch in JavaScript and include the typical UI libraries (JQuery/JQueryUI) when useful.

For analysis and data-wrangling, we often turn to Python. We also do a fair amount of server-side work with Python, sometimes Node, and even a little Java.

We try to use whatever tool best suited to the job, and makes us most efficient, often employing many different tools at various stages. The No Ceilings project, for instance, started with a custom-built tool created with Processing. That tool helped get a sense of what was in the quarter million rows of data, and once we found something interesting, that was exported to Excel, which was used to generate charts and short write-ups that were delivered as PDF documents. The final site uses a lot of JavaScript, but also D3 for a piece or two, and even three.js for a 3D globe for mobile devices.

What are the differences between using D3, p5.js, and Processing?

D3 is primarily a data visualization library. It is great for getting a quick view of your data and we've used it here for just that on many occasions.

Processing is a part language, part library, part environment, and often used for creative coding. The main version is Java-based (as a “language” it's sort of a dialect of Java), so the work you create with it is not readily usable on the web. However, it has a lot more juice than your browser, so when you're working with large datasets, it can make those initial iterations happen a lot faster. It also can export out to PDF, which makes it great for creating the data-driven pieces of any print work you may want to do.

p5.js is a project that builds on the ideas of Processing but rethinks some of its base decisions for the web. Primarily, it can help simplify the process of writing HTML5 canvas applications. It's a great starting place for learning to code in general (it's what's used by Khan Academy, for instance), but it's also helpful when you find yourself editing a D3 example so much that you're writing more code to change it than the example itself. When you get to that level of customization, p5 and the HTML5 canvas can be your friend.

Are you hiring? What do you look for in a candidate?

Yes! We are always looking for curious, driven candidates. We are not looking for someone who can "do it all," but people who can think creatively about how to apply their skills – whether that be coding, design, writing, or managing – to the work we do. We also have internships throughout the year. If you want to learn more, check out our careers page.

We’d love to hear what you’re working on, what you’re intrigued by, and what messy data problems we can help you solve. Find us on the web, drop us a line at hello@fathom.info, or subscribe to our newsletter.