Zachary’s Karate Club#
[1]:
import matplotlib.pyplot as plt
import networkx as nx
import pandas as pd
from flexitext import flexitext
from hiveplotlib import HivePlot
from hiveplotlib.converters import networkx_to_nodes_edges
from matplotlib.lines import Line2D
Background#
From 1970-1972, Wayne W. Zachary observed a karate club split into two factions, those that supported the officers, referred to as “Officer,” and those that supported one of the instructors, referred to as “Mr. Hi.” Eventually, the two factions formally split into two clubs.
This frequently-used dataset contains 34 club members (nodes) and a record of who socialized with whom outside of the class (edges) right before the formal split of the club.
The Data#
Grabbing this dataset is convenient through networkx
[2]:
G = nx.karate_club_graph()
Visualization#
This network is commonly visualized both in the original paper and through networkx as a circular graph
[3]:
# color the nodes by faction
color = []
for node in G.nodes():
if G.nodes.data()[node]["club"] == "Mr. Hi":
color.append("royalblue")
else:
color.append("darkorange")
fig, ax = plt.subplots(figsize=(10, 10))
plt.axis("equal")
nx.draw_circular(
G,
with_labels=True,
node_color=color,
ax=ax,
font_color="white",
node_size=1000,
)
ax.set_title("Zachary's Karate Club\nCircular Network Plot", fontsize=20)
# legend
officer_legend = Line2D(
[],
[],
markerfacecolor="darkorange",
markeredgecolor="darkorange",
marker="o",
linestyle="None",
markersize=10,
)
mr_hi_legend = Line2D(
[],
[],
markerfacecolor="royalblue",
markeredgecolor="royalblue",
marker="o",
linestyle="None",
markersize=10,
)
ax.legend(
[mr_hi_legend, officer_legend],
["Mr. Hi", "Officer"],
loc="upper left",
bbox_to_anchor=(1, 1),
title="Faction",
)
plt.show()
Conclusions from the Circular Graph#
One clear and unsurprising conclusion from this graph is that Mr. Hi (0) and Officer (33) were popular, but it’s hard to conclude much else.
Look at this graph for roughly 10 seconds, then ask yourself the following questions:
How socially separated are the two factions? How long does it take to confirm there exists a connection between blue and orange?
Are the connections between the two groups from people who are generally more social?
To answer the first question, we could certainly be more careful in ordering these nodes in our plot, partitioning orange from blue, but the second question would still be difficult.
An Alternative: Hive Plots#
Hive Plots allow us to carefully choose both the axes on which to place nodes and how to align the nodes on those axes.
There is thus a lot of necessary declaration, but with the payoff of a far more interpretable network visualization.
To answer our above questions, we will structure our Hive Plot in the following way:
We will construct a total of 4 axes. 2 axes for the Officer faction, and 2 axes for the Mr. Hi Faction. This use of repeat axes allows us to see the intra-faction behavior in a well-defined way in our resulting visualization.
We will look at 3 sets of edges. Edges within the Officer faction, edges within the Mr. Hi faction, and edges between the two factions. This will give us a clear answer to our first question above.
We will sort our axes for each faction by node degree. This allows us to nicely answer our second question above, but more on this later.
Let’s first calculate degree for all of our nodes, and simultaneously build the necessary data structures for hiveplotlib:
Convert the networkx structure into the needed hiveplotlib structure#
[4]:
# convert `networkx` graph into `hiveplotlib`-ready nodes and edges
nodes, edges = networkx_to_nodes_edges(G)
# pull out degree information from nodes
degrees = pd.DataFrame(G.degree, columns=[nodes.unique_id_column, "degree"])
# add degree information to node data
nodes.data = nodes.data.merge(degrees, on=nodes.unique_id_column)
[5]:
nodes.data.head()
[5]:
| unique_id | club | degree | |
|---|---|---|---|
| 0 | 0 | Mr. Hi | 16 |
| 1 | 1 | Mr. Hi | 9 |
| 2 | 2 | Mr. Hi | 10 |
| 3 | 3 | Mr. Hi | 6 |
| 4 | 4 | Mr. Hi | 3 |
[6]:
edges.data.head()
[6]:
| from | to | weight | |
|---|---|---|---|
| 0 | 0 | 1 | 4 |
| 1 | 0 | 2 | 5 |
| 2 | 0 | 3 | 3 |
| 3 | 0 | 4 | 3 |
| 4 | 0 | 5 | 3 |
Constructing Our Hive Plot#
Next, let’s build out a HivePlot() instance.
[7]:
hp = HivePlot(
nodes=nodes,
edges=edges,
partition_variable="club",
sorting_variables="degree",
repeat_axes=True,
non_repeat_edge_kwargs={"color": "darkgray"},
)
# color intra-faction edges to match color of nodes in circular viz above
hp.update_edges("Mr. Hi", "Mr. Hi", color="royalblue")
hp.update_edges("Officer", "Officer", color="darkorange")
# only show the repeat edges on one end of the hive plot
hp.reset_edges(axis_id_1="Mr. Hi_repeat", axis_id_2="Officer")
As an extension of the hiveplotlib visualization, we will also pull out the Officer and Mr. Hi node placements to plot them in different colors in the final figure.
[8]:
# pull out the location of the Officer and Mr. Hi nodes for visual emphasis
officer_degree_locations = hp.axes["Officer_repeat"].node_placements
officer_node = (
officer_degree_locations.loc[
officer_degree_locations.loc[:, "unique_id"] == 33, ["x", "y"]
]
.to_numpy()
.flatten()
)
mr_hi_degree_locations = hp.axes["Mr. Hi"].node_placements
mr_hi_node = (
mr_hi_degree_locations.loc[
mr_hi_degree_locations.loc[:, "unique_id"] == 0, ["x", "y"]
]
.to_numpy()
.flatten()
)
Visualizing the Hive Plot#
Once the decisions are made above, all that remains is visualizing the Hive Plot.
[9]:
fig, ax = hp.plot()
# highlight Mr. Hi and Officer on the degree axes
ax.scatter(
officer_node[0],
officer_node[1],
facecolor="red",
edgecolor="black",
s=150,
lw=2,
zorder=5,
)
ax.scatter(
mr_hi_node[0],
mr_hi_node[1],
facecolor="gold",
edgecolor="black",
s=150,
lw=2,
zorder=5,
)
# embed legend in the title
flexitext(
x=0.4,
y=0.73,
s="<size:20>Zachary's Karate Club Hive Plot\n</>"
"<size:14><color:gold, weight: bold>Mr. Hi</> and "
"<color:red, weight:bold>Officer</> most connected (highest node degree)</>\n"
"<size:14>Some <color:darkgray, weight: bold>inter-faction communication</>\n"
"But more connections <style:italic>within</> the "
"<color:royalblue, weight: bold>Mr. Hi faction</> / "
"<color:darkorange, weight:bold>Officer faction</></>",
xycoords="figure fraction",
ha="center",
)
plt.show()
Let’s revisit our questions from earlier:
How socially separated are the two factions? How long does it take to confirm there exists a connection between blue and orange?
From this figure, there appear to be far more intra-faction connections than inter-faction connections, but we can clearly see inter-faction connections in gray.
Are the connections between the two groups from people who are generally more social?
There does not appear to be a particularly strong correlation between inter-faction connections and general sociability. Otherwise, there would be gray connections only between nodes high on each axis.
The setup costs are of course higher to generate this Hive Plot visualization than the circular layout. We had to make the axes and sorting decisions. As a reward, however, we can generate unambiguous visualizations that can serve as first steps in answering genuine research questions.
References#
Zachary W. (1977). An information flow model for conflict and fission in small groups. Journal of Anthropological Research, 33, 452-473.