Interns, integers, and complexity | Fathom Information Design

Chris

June 01, 2012

Interns, integers, and complexity

For the week of May 14th, I had the pleasure of having my very own intern. This wasn't just any intern, but Dana Fry, an intern excited about mathematics—rare, I know. After spending some time thinking about a project that (a) would have a relatively low technical barrier to entry, (b) would be open-ended enough to allow exploration (like a freeform jazz odyssey) and (c) wouldn’t require too much cobweb dusting on my part, I decided upon the relatively simple question:

In a finite integer sequence, for example {1, 2, 3, 4, 5, ...}, what comes next?

Bad news is there isn’t a unique answer, as it depends on how complex the function is that generated the sequence. Good news is that you have a better chance of guessing blindly. To demonstrate, here are three simple generators and their sequences. Note how all three generators agree up until the sixth integer, then diverge quite quickly:

G₁(n) = n; {1, 2, 3, 4, 5, 6, 7, 8, 9, 10,...}

G₂(n) = n⁵- 15n⁴+ 85n³- 225n²+ 275n - 120; {1, 2, 3, 4, 5, 126, 727, 2528, 6729, 15130,...}

G₃(n) = n + DIGIT_SUM(n) - MOD(n, 6); {1, 2, 3, 4, 5, 12, 13, 14, 15, 7,...}

With the initial problem ill-posed and open-ended, we embarked on a journey exploring the degeneracies and intricacies of integer sequences. The first step: a problem set, because ~~I needed to buy myself a bit more time~~ what’s a math class without a problem set? So on Monday morning I gave my intern her first problem set pertaining to integer sequences, finite differences, and Newton’s expansion. This would keep her busy. Meanwhile, I'd have plenty of time to formulate a real game plan for the week. But much to my horror, Dana finished by lunch time and was now sitting in the intern pen with a polite look of "what's next" in her eyes. So I did what any mentor would do: I feverishly scrambled and chalked my lack of preparation up to the problem at hand.

Thankfully I had spent the better part of the previous week compiling different tools that we could use for the project. The first tool was a small library of Java code that could be used to fit arbitrary integer sequences using genetic programming, the second was an extensive list of curated integer sequences housed at the Online Encyclopedia of Integer Sequences (OEIS), and the final tool was visualization—that’s what we do.

Dana spent the next three days investigating integer sequences using a combination of genetic programming, the OEIS, and Processing, and started to uncover interesting connections between seemingly simple functions and complex sequences. Below is an example of a function that finds the greatest common divisor between a number and the remainder of the same number upon division by ten. The entire sequence is plotted in the upper right panel, while each row in the lower left triangle shows the sequence broken up into multiple subsequences. For example, the third row shows the sequence broken up into three interleaved sequences, where the first plot is every third integer starting with the first, the second plot is every third integer starting with the second, and the third plot is every third integer starting with the third. Notice what happens when you divide the sequence up into ten interleaved sequences. The tenth subsequence is perfectly linear (i.e. the plot in the lower right corner).

GCD(n, MOD(n, 10))

Interestingly, this highlights a way to compare and contrast sequence complexity in terms of how easy it is to build them up from simple interleaved sequences. The more interleaved sequences that are required, the greater the algorithmic complexity of the generating function. Here are a few other plots to compare and contrast different generators:

DIGIT_SUM(N²)

The decimal digits of pi.

And like most short projects, this isn't as much where the story ends but really where it begins. In the course of the project, we certainly learned a lot about integer sequences, but we left many unanswered questions in our wake. The most obvious open questions regard the use of interleaving sequences to generate more complicated sequences. Is this a particularly interesting way to define algorithmic complexity? Are there sequences that don't have closed-form equations where interleaving sequences are useful? Is there any precedence in the literature regarding interleaving sequences, and if so with pertinence to what fields? These will be left to future inquiry, but all in all it was a really successful week and great having a motivated intern like Dana around.

We’d love to hear what you’re working on, what you’re intrigued by, and what messy data problems we can help you solve. Find us on the web, drop us a line at hello@fathom.info, or subscribe to our newsletter.