Creating Hive Plots from Networkx#
This notebook discusses how to create hive plots from a networkx.Graph instance.
Note: this notebook requires that Hiveplotlib be installed with extra packages, which can be done by running:
pip install hiveplotlib[networkx]
[1]:
import networkx as nx
import pandas as pd
from hiveplotlib import HivePlot
from hiveplotlib.converters import networkx_to_nodes_edges
We will base this discussion on the Zachary’s Karate Club graph available in networkx. For more information about this example network, see our Zachary’s Karate Club tutorial.
[2]:
G = nx.karate_club_graph()
In order to generate a HivePlot instance, we must create:
A
NodeCollectioninstance.An
Edgesinstance.A node partition variable.
A node sorting variable.
We cover each of these tasks in the sections below.
Convert Networkx Graph to NodeCollection and Edges#
To generate a hiveplotlib.HivePlot instance, we first need a hiveplotlib.NodeCollection instance of node data and a hiveplotlib.Edges instance of edge data.
This conversion is easy using the hiveplotlib.converters.networkx_to_nodes_edges() function, which takes our networkx.Graph as an input and returns a NodeCollection instance and an Edges instance.
[3]:
nodes, edges = networkx_to_nodes_edges(graph=G)
[4]:
nodes.data.head()
[4]:
| unique_id | club | |
|---|---|---|
| 0 | 0 | Mr. Hi |
| 1 | 1 | Mr. Hi |
| 2 | 2 | Mr. Hi |
| 3 | 3 | Mr. Hi |
| 4 | 4 | Mr. Hi |
[5]:
edges.data.head()
[5]:
| from | to | weight | |
|---|---|---|---|
| 0 | 0 | 1 | 4 |
| 1 | 0 | 2 | 5 |
| 2 | 0 | 3 | 3 |
| 3 | 0 | 4 | 3 |
| 4 | 0 | 5 | 3 |
Storing Graph Properties Using Networkx#
We want to use graph properties when generating our hive plot below. Specifically, we want to grab node degree.
networkx has nice support for pulling many graph properties from a network.
Let’s first pull out the node degree information using the networkx.Graph.degree call:
[6]:
# pull out degree information from nodes
degrees = pd.DataFrame(G.degree, columns=[nodes.unique_id_column, "degree"])
degrees.head()
[6]:
| unique_id | degree | |
|---|---|---|
| 0 | 0 | 16 |
| 1 | 1 | 9 |
| 2 | 2 | 10 |
| 3 | 3 | 6 |
| 4 | 4 | 3 |
To use it later for our hive plot generation, we need only merge the data with our NodeCollection.data DataFrame:
[7]:
# add degree information to NodeCollection data
nodes.data = nodes.data.merge(degrees, on=nodes.unique_id_column)
nodes.data.head()
[7]:
| unique_id | club | degree | |
|---|---|---|---|
| 0 | 0 | Mr. Hi | 16 |
| 1 | 1 | Mr. Hi | 9 |
| 2 | 2 | Mr. Hi | 10 |
| 3 | 3 | Mr. Hi | 6 |
| 4 | 4 | Mr. Hi | 3 |
Create Partition Variable#
In order to make a hive plot, we must choose a partition of the nodes, which lets us split up the nodes into separate axes.
Normally, we would need to generate a node partition variable with the NodeCollection.create_partition_variable() method, but in this case, we already have the club variable to use:
[8]:
partition_variable = "club"
For more on how and why we partition node data for hive plots, see the Setting a Partition Variable page.
Choose Sorting Variables#
In order to make a hive plot, we must choose the sorting variables, one for each axis. This lets us order and place our nodes on each axis.
We can easily set all axes’ sorting variables to the same value by assigning our sorting_variables parameter to a node data column name when we instantiate our hive plot. Here we will use the node degree variable we created above:
[9]:
sorting_variables = "degree"
For more on how and why we set the sorting variables in hive plots, see the Setting Axis Sorting Variables page.
Create HivePlot From NodeCollection and Edges#
With our nodes and edges (and the partition variable and sorting variables) set, we have everything we need to generate a hiveplotlib.HivePlot instance.
We will make two specific changes from the defaults for this hive plot:
Since we only have two partition groups, we will manually remove one set of inter-axis edges since these two sets of edges are identical.
Since intra-group behavior is of interest with the Zachary’s Karate Club network, we will set
repeat_axes=True.
[10]:
hp = HivePlot(
nodes=nodes, # our NodeCollection from above
edges=edges, # our Edges from above
partition_variable=partition_variable, # node column name assigned above
sorting_variables=sorting_variables, # node column name assigned above
repeat_axes=True, # repeat axes interesting for this network
)
# only 2 unique axes, so we'll kill one set of inter-axis edges
# since they're redundant
hp.reset_edges(
axis_id_1="Mr. Hi_repeat",
axis_id_2="Officer",
)
hp.plot();