In my last blogpost, I showed some visualizations generated by usage data from our tool Mirador. These visualizations rely on the calculation of a “distance” between variables in a dataset, and Information Theory allows us to define such distance, as we will see below.

The notion of distance is essential to most visual representations of data, and we are intuitively– possibly innately– familiar with it. If we are in two or three dimensional space, we can use the Euclidean distance between two points `p_{1}=(x_{1}, y_{1})` and `p2=(x_{2}, y_{2})`, defined as `d(p_{1}, p_{2}) = sqrt((x_{1} – x_{2})^2 + (y_{1} – y_{2})^2)`, to determine the distance between any pair of points in the space.

But how do we define “distance” between more abstract entities, such as random variables? Mathematically, a distance function in an arbitrary set is a function that gives a real number for any pair of objects from the set, and satisfies the following “metric” properties:

- `d(p, p) = 0` for any `p`. The distance of any element with itself is always zero.
- `d(p, q) = 0` if and only if `p = q`. The distance between two objects can only be zero when the two objets are identical, and vice versa.
- `d(p, q) \leq d(p, w) + d(w, q)`. This last property is called the
*triangle inequality*, and geometrically means that the distance traversed between two objects `p` and `q` is always less than traversing through an intermediate object `w`:

```
var x1 = 90;
var x2 = 180;
var y1 = 250;
var y2 = 20;
var z1 = 170;
var z2 = 100;
var zsel = false;
function setup() {
createCanvas(340, 250);
smooth(2);
textSize(14);
textStyle(ITALIC);
textFont("Times New Roman");
}
function draw() {
background(255);
stroke(0, 180);
line(x1, x2, y1, y2);
stroke(0, 255);
line(x1, x2, z1, z2);
line(z1, z2, y1, y2);
noStroke();
fill(0);
text("p", x1 + 10, x2 + 10);
text("q", y1 + 10, y2 + 10);
text("w", z1 + 10, z2 + 10);
ellipseMode(CENTER);
noStroke();
fill(180);
ellipse(x1, x2, 10, 10);
ellipse(y1, y2, 10, 10);
var r = 10;
if (!zsel) {
r = map(cos(frameCount / 10.0), -1, +1, 9, 12);
}
ellipse(z1, z2, r, r);
var dxy = dist(x1, x2, y1, y2);
var dxz = dist(x1, x2, z1, z2);
var dzy = dist(z1, z2, y1, y2);
var sum = dxz + dzy;
var s = "=";
if (dxy < sum) {
s = "<";
}
fill(0);
text("d(p, q) = " + nfc(dxy, 2) + " " + s + " " + nfc(dxz, 2) + " + " + nfc(dzy, 2) + " = d(p, w) + d(w, q)", 10, 235);
}
function mousePressed() {
if (!zsel) {
zsel = dist(mouseX, mouseY, z1, z2) < 10;
}
}
function mouseDragged() {
if (zsel & 90 < mouseX && mouseX < width - 90 &&
20 < mouseY && mouseY < height - 70) {
z1 = mouseX;
z2 = mouseY;
}
}
function mouseReleased() {
zsel = false;
}
```

Any function that satisfies these three properties is called a distance. The Euclidean distance discussed before is one such function, but there are other distance functions in 2-D or 3-D space that are not Euclidean, for example the Manhattan and Chebyshev distances.

Thus, if we are in the 2-D or 3-D spaces there are several distance functions we can use to quantify how far apart are pairs of elements from each other. However, if we are working with sets of elements that are not 2-D or 3-D vectors, it can be harder to get a sense of “distance” between “points” in the space. I found it very interesting that we can actually define a proper distance function between arbitrary random variables. In a previous post, I did an informal introduction of the Shannon Entropy `H(X)`, a mathematical measure of the amount of “surprise” received upon measuring a random variable `X`. This definition led us to the concept of mutual information `I(X, Y)`, which quantifies the level of statistical dependency between two variables `X` and `Y`.

We concluded that `I(X, Y) = H(X) + H(Y) – H(X, Y)`, which we can visualize as the area shared between the marginal entropies `H(X)` and `H(Y)`, as depicted in this diagram.

The mutual information varies between `0`, when the two variables are independent, and `H(X, Y)`, when they are statistically identical. So what about the remainder of subtracting `I(X, Y)` from the joint entropy `H(X, Y)`? It is `0` when the variables are identical, and takes the maximum value `H(X, Y)` when they are totally unrelated. Could it be then that the following quantity:

`D(X, Y) = H(X, Y) – I(X, Y)`

is our distance function? We can use a simple Venn diagram to represent this function graphically:

```
var distSketch = new p5(function(g) {
var distance;
var x1 = 80;
var x2 = 100;
var y1 = 120;
var y2 = 100;
var offset;
var hoverD = false;
var hoverH = false;
var hoverI = false;
var tend = 0;
var ptarg = false;
var yalpha;
var xoff;
var selColor;
g.setup = function() {
g.createCanvas(600, 210);
g.ellipseMode(g.RADIUS);
g.smooth(2);
g.textSize(14);
g.textStyle(g.ITALIC);
g.textFont("Times New Roman");
selColor = new SoftFloat(255);
distance = new SoftFloat(40);
offset = new SoftFloat(0);
yalpha = new SoftFloat(255);
xoff = new SoftFloat(0);
};
g.draw = function() {
g.background(255);
g.translate(200, 0);
selColor.update();
distance.update();
offset.update();
yalpha.update();
xoff.update();
var d = distance.get();
var x0 = offset.get();
var x1 = 100 - d/2;
var y1 = 100 + d/2;
var disjoint = g.abs(d - 80) < 0.01;
var coincident = g.abs(d) < 0.01;
if (disjoint) {
offset.setTarget(20);
} else if (coincident) {
offset.setTarget(15);
} else {
offset.setTarget(0);
}
if (hoverD) {
var angles = g.circleIntersection(x1, x2, 40, y1, y2, 40);
g.noStroke();
g.fill(selColor.get());
g.ellipse(x1, x2, 40, 40);
g.ellipse(y1, y2, 40, 40);
g.fill(255);
g.arc(x1, x2, 40, 40, angles[0], angles[1], g.CHORD);
g.arc(y1, y2, 40, 40, angles[2], angles[3], g.CHORD);
} else if (hoverH) {
g.noStroke();
g.fill(selColor.get());
g.ellipse(x1, x2, 40, 40);
g.ellipse(y1, y2, 40, 40);
} else if (hoverI) {
var angles = g.circleIntersection(x1, x2, 40, y1, y2, 40);
g.noStroke();
g.fill(selColor.get());
if (!disjoint & !coincident) {
g.arc(x1, x2, 40, 40, angles[0], angles[1], g.CHORD);
g.arc(y1, y2, 40, 40, angles[2], angles[3], g.CHORD);
} else if (coincident) {
g.ellipse(x1, x2, 40, 40);
}
}
g.noFill();
g.stroke(0);
g.ellipse(x1, x2, 40, 40);
g.ellipse(y1, y2, 40, 40);
g.noStroke();
var xb = x0 + 23;
if (!coincident) {
if (hoverD) {
g.fill(selColor.get());
} else {
g.fill(170, g.map(g.cos(g.frameCount / 24.0), -1, +1, 0, 50));
}
g.rect(xb, 185, 45, 20);
}
var xb = coincident ? xb + 25 : xb + 61;
if (hoverH) {
g.fill(selColor.get());
} else {
g.fill(170, g.map(g.cos(g.frameCount / 20.0 + g.QUARTER_PI), -1, +1, 0, 50));
}
g.rect(xb, 185, 43, 20);
if (!disjoint) {
xb += 54;
if (hoverI) {
g.fill(selColor.get());
} else {
g.fill(170, g.map(g.cos(g.frameCount / 22.0 + g.HALF_PI), -1, +1, 0, 50));
}
g.rect(xb, 185, 40, 20);
}
g.noStroke();
g.fill(0, yalpha.get());
g.text("H(Y)", y1 + 20, 50);
g.fill(0);
g.text("H(X)", x1 - 40 + xoff.get(), 50);
var caption = "" ;
if (disjoint) {
caption = "D(X, Y) = H(X, Y)";
} else if (coincident) {
caption = "0 = H(X, Y) - I(X, Y)";
} else {
caption = "D(X, Y) = H(X, Y) - I(X, Y)";
}
g.text(caption, x0 + 25, 200);
var t = g.millis();
if (!distance.targeting) {
if (ptarg) {
// Animation has ended.
tend = t;
}
if (t - tend > 10000 & g.abs(d - 40) > 0.01) {
// Restore to default after 10 seconds
distance.setTarget(40);
yalpha.setTarget(255);
xoff.setTarget(0);
}
}
ptarg = distance.targeting;
};
g.mouseMoved = function() {
var xb = 23 + offset.get();
var mx = g.mouseX - 200;
var my = g.mouseY;
var d = distance.get();
var disjoint = g.abs(d - 80) < 0.01;
var coincident = g.abs(d) < 0.01;
hoverD = false;
if (!coincident & xb < mx && mx < xb + 45 &&
185 < my && my < 185 + 20) {
hoverD = true;
selColor.setTarget(170);
}
var xb = coincident ? xb + 25 : xb + 61;
hoverH = false;
if (xb < mx & mx < xb + 43 &&
185 < my && my < 185 + 20) {
hoverH = true;
selColor.setTarget(170);
}
hoverI = false;
xb += 54;
if (!disjoint & xb < mx && mx < xb + 40 &&
185 < my && my < 185 + 20) {
hoverI = true;
selColor.setTarget(170);
}
if (!hoverD & !hoverH && !hoverI) {
selColor.set(255);
}
};
// g.keyPressed = function() {
// if (g.key == "1") {
// g.makeDisjoint();
// } else if (g.key == "2") {
// g.makeCoincident();
// } else if (g.key == "3") {
// g.makeDistant();
// }
// }
g.makeDisjoint = function() {
distance.setTarget(80);
yalpha.setTarget(255);
xoff.setTarget(0);
};
g.makeCoincident = function() {
distance.setTarget(0);
yalpha.setTarget(0);
xoff.setTarget(25);
};
g.makeDistant = function() {
distance.setTarget(70);
yalpha.setTarget(255);
xoff.setTarget(0);
};
g.circleIntersection = function(x1, y1, r1, x2, y2, r2) {
var R = r1;
var r = r2;
var d = g.dist(x1, y1, x2, y2);
var v12 = g.createVector(x2 - x1, y2 - y1);
var n12 = g.createVector(y2 - y1, x1 - x2);
v12.normalize();
n12.normalize();
// http://mathworld.wolfram.com/Circle-CircleIntersection.html
var x = (g.sq(d) - g.sq(r) + g.sq(R)) / (2 * d);
var ya = g.sqrt(g.sq(R) - g.sq(x));
var yb = -ya;
var pa1 = g.createVector(x * v12.x + ya * n12.x, x * v12.y + ya * n12.y);
var pb1 = g.createVector(x * v12.x + yb * n12.x, x * v12.y + yb * n12.y);
var ha1 = pa1.heading();
var hb1 = pb1.heading();
var pa2 = g.createVector(x1 - x2 + pa1.x, y1 - y2 + pa1.y);
var pb2 = g.createVector(x1 - x2 + pb1.x, y1 - y2 + pb1.y);
var ha2 = pa2.heading();
var hb2 = pb2.heading();
return [ha1, hb1, hb2, ha2];
};
/////////////////////////////////////////////////////////////////////////////
//
// SoftFloat class definition
function SoftFloat(v) {
this.attraction = 0.1;
this.damping = 0.5;
this.value = v;
this.velocity = 0;
this.acceleration = 0;
this.targeting = false;
this.target = v;
}
SoftFloat.prototype.update = function() {
if (this.targeting) {
this.acceleration += this.attraction * (this.target - this.value);
this.velocity = (this.velocity + this.acceleration) * this.damping;
this.value += this.velocity;
this.acceleration = 0;
if (g.abs(this.velocity) > 0.0001 & g.abs(this.target - this.value) >= 0) {
return true; // still updating
}
this.value = this.target; // arrived, set it to the target value to prevent rounding error
this.targeting = false;
}
return false;
}
SoftFloat.prototype.setTarget = function(t) {
if (g.abs(this.target - t) >= 0) {
this.targeting = true;
this.target = t;
}
}
SoftFloat.prototype.set = function(v) {
this.value = v;
}
SoftFloat.prototype.get = function() {
return this.value;
}
});
```

The smaller the intersection is (the less correlated the variables are) then the larger the area of the disjoint pieces will be, and so the distance `D(X, Y)`. When the variables are entirely uncorrelated, then the intersection is empty and the distance reaches its maximum value `H(X, Y)`.

In order to find out, we need to prove that this function does indeed satisfy the three metric properties. From the Venn diagram itself we can quickly verify the first two: when the two circles are completely overlapping, then the difference between area of the intersection and the area of the union is exactly `0`, which means that `D(X, X) = 0`. We already discussed that if two variables are statistically identical then the mutual information is equal to the joint entropy and so `D(X, Y) = 0`. For the converse, we just need to note that if the area of the intersection is the same as that of the union, then the only possibility is that the two ellipses are coincident, hence `X = Y`.

The final part is to check the triangle inequality, meaning that we need to verify that:

`D(X, Y) \leq D(X, Z) + D(Z, Y)`

This looks like the most challenging step! However, we can put together a simple graphical proof inspired by the previous pictorial representation of our candidate distance function. Since this “informational distance” is precisely the portion of the joint entropy that is not shared between the two variables, we could represent the situation with three variables also via a Venn diagram as follows:

```
var x1 = 80;
var x2 = 100;
var y1 = 120;
var y2 = 100;
var z1 = 100;
var z2 = 135;
var hoverXY = false;
var hoverXZ = false;
var hoverYZ = false;
var hoverXYZ = false;
function setup() {
createCanvas(600, 240);
ellipseMode(RADIUS);
smooth(2);
textSize(14);
textStyle(ITALIC);
textFont("Times New Roman");
selColor = new SoftFloat(255);
}
function draw() {
background(255);
translate(200, 0);
selColor.update();
if (hoverXY) {
var angles = circle2Intersection(x1, x2, 40, y1, y2, 40);
noStroke();
fill(selColor.get());
ellipse(x1, x2, 40, 40);
ellipse(y1, y2, 40, 40);
fill(255);
arc(x1, x2, 40, 40, angles[0], angles[1], CHORD);
arc(y1, y2, 40, 40, angles[2], angles[3], CHORD);
} else if (hoverXZ) {
var angles = circle2Intersection(x1, x2, 40, z1, z2, 40);
noStroke();
fill(selColor.get());
ellipse(x1, x2, 40, 40);
ellipse(z1, z2, 40, 40);
fill(255);
arc(x1, x2, 40, 40, angles[0], angles[1], CHORD);
arc(z1, z2, 40, 40, angles[2], angles[3], CHORD);
} else if (hoverYZ) {
var angles = circle2Intersection(y1, y2, 40, z1, z2, 40);
noStroke();
fill(selColor.get());
ellipse(y1, y2, 40, 40);
ellipse(z1, z2, 40, 40);
fill(255);
arc(y1, y2, 40, 40, angles[0], angles[1], CHORD);
arc(z1, z2, 40, 40, angles[2], angles[3], CHORD);
} else if (hoverXYZ) {
noStroke();
fill(selColor.get());
ellipse(x1, x2, 40, 40);
ellipse(y1, y2, 40, 40);
ellipse(z1, z2, 40, 40);
var angles1 = circle3Intersection(x1, x2, 40, y1, y2, 40, z1, z2, 40, 2, 1);
var angles2 = circle3Intersection(y1, y2, 40, x1, x2, 40, z1, z2, 40, 0, 3);
var angles3 = circle3Intersection(z1, z2, 40, x1, x2, 40, y1, y2, 40, 2, 1);
fill(255);
arc(x1, x2, 40, 40, angles1[0], angles1[1], OPEN);
arc(y1, y2, 40, 40, angles2[0], angles2[1], OPEN);
arc(z1, z2, 40, 40, angles3[0], angles3[1], OPEN);
var p0 = circle3IntersectionPt(x1, x2, 40, y1, y2, 40, z1, z2, 40, 0, 1);
var p1 = circle3IntersectionPt(y1, y2, 40, x1, x2, 40, z1, z2, 40, 1, 1);
var p2 = circle3IntersectionPt(z1, z2, 40, x1, x2, 40, y1, y2, 40, 0, 1);
triangle(x1 + p0.x, x2 + p0.y,
y1 + p1.x, y2 + p1.y,
z1 + p2.x, z2 + p2.y);
//
// fill(255, 0, 0);
// ellipse(x1 + p0.x, x2 + p0.y, 5, 5);
// ellipse(y1 + p1.x, y2 + p1.y, 5, 5);
// ellipse(z1 + p2.x, z2 + p2.y, 5, 5);
}
noFill();
stroke(0);
ellipse(x1, x2, 40, 40);
ellipse(y1, y2, 40, 40);
ellipse(z1, z2, 40, 40);
noStroke();
if (hoverXY) {
fill(selColor.get());
} else {
fill(170, map(cos(frameCount / 24.0), -1, +1, 0, 50));
}
rect(18, 215, 45, 20);
if (hoverXZ) {
fill(selColor.get());
} else {
fill(170, map(cos(frameCount / 20.0 + QUARTER_PI), -1, +1, 0, 50));
}
rect(78, 215, 43, 20);
if (hoverYZ) {
fill(selColor.get());
} else {
fill(170, map(cos(frameCount / 22.0 + HALF_PI), -1, +1, 0, 50));
}
rect(136, 215, 44, 20);
if (hoverXYZ) {
fill(selColor.get());
rect(78, 215, 43, 20);
rect(136, 215, 44, 20);
} else {
fill(170, map(cos(frameCount / 19.0 + PI), -1, +1, 0, 50));
}
rect(121, 215, 15, 20);
noStroke();
fill(0);
text("H(X)", 40, 50);
text("H(Y)", 140, 50);
text("H(Z)", 85, 195);
text("D(X, Y) " + String.fromCharCode(8804) + " D(X, Z) + D(Z, Y)", 20, 230);
}
function mouseMoved() {
var mx = mouseX - 200;
var my = mouseY;
hoverXY = false;
if (18 < mx & mx < 18 + 45 && 215 < my && my < 215 + 20) {
hoverXY = true;
selColor.setTarget(170);
}
hoverXZ = false;
if (78 < mx & mx < 78 + 43 && 215 < my && my < 215 + 20) {
hoverXZ = true;
selColor.setTarget(170);
}
hoverYZ = false;
if (136 < mx & mx < 136 + 44 && 215 < my && my < 215 + 20) {
hoverYZ = true;
selColor.setTarget(170);
}
hoverXYZ = false;
if (121 <= mx & mx <= 121 + 15 && 215 <= my && my <= 215 + 20) {
hoverXYZ = true;
selColor.setTarget(170);
}
if (!hoverXY & !hoverXZ && !hoverYZ && !hoverXYZ) {
selColor.set(255);
}
}
function circleIntersection(x1, y1, r1, x2, y2, r2) {
var R = r1;
var r = r2;
var d = dist(x1, y1, x2, y2);
var v12 = createVector(x2 - x1, y2 - y1);
var n12 = createVector(y2 - y1, x1 - x2);
v12.normalize();
n12.normalize();
// http://mathworld.wolfram.com/Circle-CircleIntersection.html
var x = (sq(d) - sq(r) + sq(R)) / (2 * d);
var ya = sqrt(sq(R) - sq(x));
var yb = -ya;
var pa1 = createVector(x * v12.x + ya * n12.x, x * v12.y + ya * n12.y);
var pb1 = createVector(x * v12.x + yb * n12.x, x * v12.y + yb * n12.y);
var pa2 = createVector(x1 - x2 + pa1.x, y1 - y2 + pa1.y);
var pb2 = createVector(x1 - x2 + pb1.x, y1 - y2 + pb1.y);
return [pa1, pb1, pa2, pb2];
}
function circle2Intersection(x1, y1, r1, x2, y2, r2) {
var points = circleIntersection(x1, y1, r1, x2, y2, r2);
var pa1 = points[0];
var pb1 = points[1];
var pa2 = points[2];
var pb2 = points[3];
var ha1 = pa1.heading();
var hb1 = pb1.heading();
var ha2 = pa2.heading();
var hb2 = pb2.heading();
return [ha1, hb1, hb2, ha2];
}
function circle3Intersection(x1, y1, r1, x2, y2, r2, x3, y3, r3, i0, i1) {
var a2 = circle2Intersection(x1, y1, r1, x2, y2, r2);
var a3 = circle2Intersection(x1, y1, r1, x3, y3, r3);
var angles = [a2[0], a2[1], a3[0], a3[1]];
return [angles[i0], angles[i1]];
}
function circle3IntersectionPt(x1, y1, r1, x2, y2, r2, x3, y3, r3, i0, i1) {
var p2 = circleIntersection(x1, y1, r1, x2, y2, r2);
var p3 = circleIntersection(x1, y1, r1, x3, y3, r3);
if (i0 == 0) return p2[i1];
else return p3[i1];
}
/////////////////////////////////////////////////////////////////////////////
//
// SoftFloat class definition
function SoftFloat(v) {
this.attraction = 0.1;
this.damping = 0.5;
this.value = v;
this.velocity = 0;
this.acceleration = 0;
this.targeting = false;
this.target = v;
}
SoftFloat.prototype.update = function() {
if (this.targeting) {
this.acceleration += this.attraction * (this.target - this.value);
this.velocity = (this.velocity + this.acceleration) * this.damping;
this.value += this.velocity;
this.acceleration = 0;
if (Math.abs(this.velocity) > 0.0001 & Math.abs(this.target - this.value) >= 0) {
return true; // still updating
}
this.value = this.target; // arrived, set it to the target value to prevent rounding error
this.targeting = false;
}
return false;
}
SoftFloat.prototype.setTarget = function(t) {
this.targeting = true;
this.target = t;
}
SoftFloat.prototype.set = function(v) {
this.value = v;
}
SoftFloat.prototype.get = function() {
return this.value;
}
```

By hovering over the elements of the inequality, we can see that the sum `D(X, Z) + D(Z, Y)` is greater or equal than `D(X, Y)` since it covers the entire area of the union of the three circles, with the exception of the intersection between all of them.

This visual demonstration relies on identifying the circles with the Shannon entropies of each variable, and the intersecting areas with the corresponding mutual informations. Do you think this identification is valid? Send me an email if you have some thoughts about these assumptions, or any other comments!

# Additional reading

Check out this essay by Jim Bumgardner on Information Theory and Art, published on the Issue #3 of the Mungbeing online magazine.

And another good example of combining online text with interactive illustrations of statistical concepts, in this case on the topic of P-hacking.

# Implementation details

I used Processing and Miralib to generate the videos and images in the previous post, p5.js for the interactive snippets embedded in the blogpost, and MathJax for the mathematical formulas.