Parallel set

Turning Data Tables into a Tree

Parallel sets are a new way to analyze categorical data such as gender, age or product segment, and are particularly well suited for answering the question of how many members of column A are also in column B?

Let’s take the example of this table that is the canonical example used for parallel sets: Statistics about Titanic survivors.

Class Sex Age Survived Freq
1 1st Male Child No 0
2 2nd Male Child No 0
3 3rd Male Child No 35
4 Crew Male Child No 0
5 1st Female Child No 0
6 2nd Female Child No 0
7 3rd Female Child No 17
8 Crew Female Child No 0
9 1st Male Adult No 118
10 2nd Male Adult No 154
11 3rd Male Adult No 387
12 Crew Male Adult No 670
13 1st Female Adult No 4
14 2nd Female Adult No 13
15 3rd Female Adult No 89
16 Crew Female Adult No 3
17 1st Male Child Yes 5
18 2nd Male Child Yes 11
19 3rd Male Child Yes 13
20 Crew Male Child Yes 0
21 1st Female Child Yes 1
22 2nd Female Child Yes 13
23 3rd Female Child Yes 14
24 Crew Female Child Yes 0
25 1st Male Adult Yes 57
26 2nd Male Adult Yes 14
27 3rd Male Adult Yes 75
28 Crew Male Adult Yes 192
29 1st Female Adult Yes 140
30 2nd Female Adult Yes 80
31 3rd Female Adult Yes 76
32 Crew Female Adult Yes 20

And turn it into a parallel sets:

Ladies First!!! (as long as you’re rich or work here)

Start your eye on the top of the visualization and you can see that about two-thirds of the people on the Titanic perished; a very large percentage of lives lost being male. Moving down the chart’s far left side you see that of those females that died almost all of them were adults, primarily due to the fact that there were proportionately very few children on the Titanic. And of the female adults that perished, an eyeball-estimated 85-90% of them had cabins in 3rd class; a limited few from 2nd class; and an almost undetectable number in 1st class and crewmembers. Hmmm. Again, ladies first as long as you’re rich or work here!

Perhaps this wasn’t as biased as the above statistics suggest? Perhaps rescue from the 3rd class section of the boat was just logistically more difficult? That doesn’t seem to be the case in that the number of men that perished were quite evenly split across the three cabin classes. Lots of good stuff in this visualization. Top-down, bottom-up, just choose a path; most result in fascinating conclusions.

So, the parallel set is a new arrow in your quiver for categorical data. As with most visualizations it has advantages and drawbacks. With high-cardinality columns it becomes a bit messy and it takes time to get used to how to read it quickly. On the other hand, it’s very interesting to use with low-cardinality columns. Extra bonus: it looks intriguing and may draw increased engagement on your dashboards.