Last week Mark and I had the opportunity to be guest lecturers at Harvard’s CS171 Data Visualization course. It’s always great to be able to get out of the office to talk with people about the work we’re doing.
The CS171 course focuses on data visualization theory and covering a lot of ground in terms of learning to program and the basics of interactive visualization. We like being able to guest lecture for this course because we always hear from students afterwards that they enjoy being able to see how what they’re learning can be used outside of the classroom. Afterwards, students still had some commonly asked questions, so we thought we could answer some of those here.
Where do you find data?
This is a question we get a lot, and unfortunately there isn’t an easy answer. With our client work, they’re generally the ones providing the data, but with our curiosity pieces, the data is often sourced ourselves or even self-generated. For Scaled in Miles, Mark was interested in Miles Davis’ career and found a database of all of his recording sessions and collaborators, which became the main data source for that project. For Rocky Morphology, James watched all of the Rocky movies and recorded – by hand! – what type of scene was going on at every 1 minute interval. (He did this without telling us, otherwise we probably would have built a simple tool!) The USGS and NASA are great resources if you’re interested in earth science data. The UN and World Bank provide lots of interesting global data (which is where most of the No Ceilings data comes from), and the U.S. Census is obviously a key resource for U.S. data. Don’t be discouraged if you’re having a hard time finding data you want to work with. Unless someone is providing you with data, it can take some digging to find an interesting data set to explore.
You don’t seem to use D3. What tools do you use?
For analysis and data-wrangling, we often turn to Python. We also do a fair amount of server-side work with Python, sometimes Node, and even a little Java.
What are the differences between using D3, p5.js, and Processing?
D3 is primarily a data visualization library. It is great for getting a quick view of your data and we’ve used it here for just that on many occasions.
Processing is a part language, part library, part environment, and often used for creative coding. The main version is Java-based (as a “language” it’s sort of a dialect of Java), so the work you create with it is not readily usable on the web. However, it has a lot more juice than your browser, so when you’re working with large datasets, it can make those initial iterations happen a lot faster. It also can export out to PDF, which makes it great for creating the data-driven pieces of any print work you may want to do.
p5.js is a project that builds on the ideas of Processing but rethinks some of its base decisions for the web. Primarily, it can help simplify the process of writing HTML5 canvas applications. It’s a great starting place for learning to code in general (it’s what’s used by Khan Academy, for instance), but it’s also helpful when you find yourself editing a D3 example so much that you’re writing more code to change it than the example itself. When you get to that level of customization, p5 and the HTML5 canvas can be your friend.
Are you hiring? What do you look for in a candidate?
Yes! We are always looking for curious, driven candidates. We are not looking for someone who can “do it all,” but people who can think creatively about how to apply their skills – whether that be coding, design, writing, or managing – to the work we do. We also have internships throughout the year. If you want to learn more, check out our careers page.