Hive Plot Matrix: from_partition#

This notebook covers the features and options of HivePlotMatrix.from_partition, which allows us to look at hive plots exploring a partition with more than three groups by arranging pairwise group comparisons into an upper-triangular Hive Plot Matrix (HPM).

For a longer-form discussion motivating the use of these HPMs specifically, see the Hive Plots for More Than Three Groups page.

For additional discussion motivating Hive Plot Matrices (HPMs) and the different HPM options, see the Hive Plot Matrices tutorial.

[1]:
import matplotlib.pyplot as plt
from hiveplotlib import HivePlotMatrix
from hiveplotlib.datasets import example_hpm_nodes_and_edges

We will base this discussion on the following toy dataset:

[2]:
nodes, edges = example_hpm_nodes_and_edges(
    num_groups=4,
    edge_tag_counts={"official": 90},
)
[3]:
nodes
[3]:
hiveplotlib.NodeCollection of 40 nodes and unique ID column 'unique_id'.
[4]:
nodes.data.head()
[4]:
unique_id group value1 value2 value3
0 0 A 1.934890 4.371519 9.160784
1 1 A 1.097196 8.326782 8.515967
2 2 A 2.146495 7.002651 9.535051
3 3 A 1.743420 3.123666 7.917432
4 4 A 0.235443 8.322598 7.556780
[5]:
edges
[5]:
hiveplotlib.Edges of 90 edges.
[6]:
edges.data.head()
[6]:
from to
0 3 30
1 26 17
2 17 34
3 3 27
4 8 3

The Upper-Triangular Layout#

from_partition auto-detects the group values from the provided partition_variable value and builds one cell for every unique combination of 2 group values in the upper triangle. With four groups, that gives us a 4×4 grid with 10 populated cells (4 diagonal + 6 off-diagonal).

Let’s build our first matrix. Note with the default progress=True, we’ll see a tqdm bar tracking cell-by-cell construction:

[7]:
hpm = HivePlotMatrix.from_partition(
    nodes=nodes,
    edges=edges,
    partition_variable="group",
    sorting_variables="value1",
)
hpm
[7]:
hiveplotlib.HivePlotMatrix (4 x 4), 10 populated cells, type='from_partition', backend='matplotlib'
[8]:
fig, axes = hpm.plot()
plt.show()
../_images/notebooks_hpm_from_partition_10_0.png

Diagonal Cells#

Diagonal cells show intragroup structure via repeat axes:

[9]:
# the (0,0) cell corresponds to group A vs itself
hpm[0, 0]
[9]:
hiveplotlib.HivePlot: 40 nodes, axes=['A'], 90 edges, partition='group', sort='value1', repeat_axes=[np.str_('A')], backend='matplotlib'
[10]:
hpm[0, 0].plot();
../_images/notebooks_hpm_from_partition_13_0.png

Off-Diagonal Cells#

An off-diagonal cell contains a hive plot with two specific groups and a collapsed “Other” axis:

[11]:
# the (0,1) cell compares group A (row 0) against group B (col 1)
hpm[0, 1]
[11]:
hiveplotlib.HivePlot: 40 nodes, axes=['A', 'B', 'Other'], 90 edges, partition='group', sort='value1', backend='matplotlib'
[12]:
hpm[0, 1].plot();
../_images/notebooks_hpm_from_partition_16_0.png

The “Other” axis holds all nodes not belonging to the two named groups. In the above plot, that means “Other” is nodes from groups C and D. This hive plot still shows the full network of nodes and edges! For more on collapsed axes, see the Collapsing Axes page.

Drilling Down on a Single Hive Plot in the HPM#

We can take a copy of a hive plot cell and explore further changes without disrupting the existing HPM. For example, we can switch to an interactive Hiveplotlib-supported back end like bokeh. Note, however, the below code will only run if you install Hiveplotlib with the bokeh dependencies:

pip install hiveplotlib[bokeh]

[13]:
from bokeh.io import output_notebook
from bokeh.plotting import show
from bokeh.resources import INLINE

output_notebook(resources=INLINE)

off_diagonal_hp = hpm[0, 1].copy()
off_diagonal_hp.set_viz_backend("bokeh")
show(off_diagonal_hp.plot())
Loading BokehJS ...

If we had found anomalous nodes or edges, for example, we could use the hover tool support with the bokeh back end to find the relevant node or edge IDs.

Unified Axis Scaling with unify_axes#

By default, each hive plot axis auto-scales to the data range of the nodes assigned to it by setting unify_axes=False. Since from_partition results in hive plots each partitioning nodes differently onto axes, this means axes will almost certainly have different ranges across cells.

Although this default behavior can be useful, especially if we want to use the full range of each axis to place nodes, it requires careful interpretation across hive plots. In the unify_axes=False case, if we see an edge from high A values to high D values, “high” is only within group.

If we want the relative position of nodes across axes to matter, we can pass unify_axes=True to auto-compute a single global vmin / vmax from all node data and apply it to each axis.

Let’s compare the two approaches below, with an eye towards the before-mentioned “high A to high D” edges (top right hive plot):

[14]:
# default: each cell auto-scales to its own data range
hpm_unscaled = HivePlotMatrix.from_partition(
    nodes=nodes,
    edges=edges,
    partition_variable="group",
    sorting_variables="value1",
    progress=False,
)
fig, axes = hpm_unscaled.plot()
fig.suptitle("Default: unify_axes=False", y=1.02, size=16)
plt.show()
../_images/notebooks_hpm_from_partition_23_0.png
[15]:
# unified: all axes share the same global range
hpm_unified = HivePlotMatrix.from_partition(
    nodes=nodes,
    edges=edges,
    partition_variable="group",
    sorting_variables="value1",
    unify_axes=True,
    progress=False,
)
fig, axes = hpm_unified.plot()
fig.suptitle("unify_axes=True", y=1.02, size=16)
plt.show()
../_images/notebooks_hpm_from_partition_24_0.png

In absolute terms, the D values are much higher on the unified axes than any of the A values:

[16]:
fig, axes = plt.subplots(1, 2, figsize=(8, 4))

hpm_unscaled[0, 3].plot(fig=fig, ax=axes[0])
axes[0].set_title("Relative", size=18, y=1.05)

hpm_unified[0, 3].plot(fig=fig, ax=axes[1])
axes[1].set_title("Absolute", size=18, y=1.05)

fig.suptitle("High A to High D", size=24, y=1.1)

plt.show()
../_images/notebooks_hpm_from_partition_26_0.png

Both choices have their use cases. For example, if we chose Gross Domestic Product (GDP) as our sorting variable, perhaps we would want to compare all nodes on the same GDP scale with unify_axes=True.

Or, if we chose node degree, and some groups were drastically smaller than others, maybe we only want to see how the most connected members of one group talk with the most connected members of another group, in which case we might keep the default unify_axes=False to look for edges between the top of each pair of axes.

Set a Specific Range for Unified Axes#

To force a specific range instead of auto-computing, we can pass a dictionary with vmin and / or vmax. Missing keys are auto-computed to the global min / max of the data.

This can be helpful if there are outliers or if there are important threshold values for a given sorting variable.

[17]:
# pin vmin to -10, auto-compute vmax from the data
hpm_pinned = HivePlotMatrix.from_partition(
    nodes=nodes,
    edges=edges,
    partition_variable="group",
    sorting_variables="value1",
    unify_axes={"vmin": -10},
    progress=False,
)
fig, axes = hpm_pinned.plot()
plt.show()
../_images/notebooks_hpm_from_partition_30_0.png

Renaming the Collapsed Axis#

The default label “Other” for the collapsed axis can be changed with collapsed_group_axis_name. This is useful when you want the label to be more descriptive of what it represents:

[18]:
hpm_named = HivePlotMatrix.from_partition(
    nodes=nodes,
    edges=edges,
    partition_variable="group",
    sorting_variables="value1",
    collapsed_group_axis_name="Other\nGroups",
    progress=False,
)
fig, axes = hpm_named.plot()
plt.show()
../_images/notebooks_hpm_from_partition_32_0.png

Filtering and Reordering with partition_values#

By default, from_partition sorts group labels alphabetically (A, B, C, D) and includes all of them. What if we only care about a subset of groups, or want a different display order? The partition_values parameter lets us override both.

[19]:
hpm_sub = HivePlotMatrix.from_partition(
    nodes=nodes,
    edges=edges,
    partition_variable="group",
    sorting_variables="value1",
    partition_values=["C", "A"],
    progress=False,
)
fig, axes = hpm_sub.plot()
plt.show()
../_images/notebooks_hpm_from_partition_34_0.png
[20]:
hpm_reordered = HivePlotMatrix.from_partition(
    nodes=nodes,
    edges=edges,
    partition_variable="group",
    sorting_variables="value1",
    partition_values=["D", "B", "A", "C"],
    progress=False,
)
fig, axes = hpm_reordered.plot()
plt.show()
../_images/notebooks_hpm_from_partition_35_0.png

Excluding Diagonal Cells#

What if we’re only interested in intergroup connectivity and don’t need to see intragroup structure? Setting include_diagonal=False omits diagonal cells entirely, leaving only off-diagonal cells with two named group axes plus “Other”:

[21]:
hpm_no_diag = HivePlotMatrix.from_partition(
    nodes=nodes,
    edges=edges,
    partition_variable="group",
    sorting_variables="value1",
    include_diagonal=False,
    progress=False,
)
fig, axes = hpm_no_diag.plot()
plt.show()
../_images/notebooks_hpm_from_partition_37_0.png

Styling Directed Edges#

If we are working with a directed network (e.g. an edge from node \(i\) to node \(j\) is not the same as an edge from \(j\) to \(i\)), then clockwise_edge_kwargs / counterclockwise_edge_kwargs allow us to see edges by direction.

In from_partition, these kwargs apply to off-diagonal cells only. repeat_edge_kwargs targets intragroup edges in diagonal cells, which have no meaningful directionality since both endpoints belong to the same group:

[22]:
hpm_styled = HivePlotMatrix.from_partition(
    nodes=nodes,
    edges=edges,
    partition_variable="group",
    sorting_variables="value1",
    all_edge_kwargs={"alpha": 0.4},
    repeat_edge_kwargs={
        "color": "royalblue",
        "linewidth": 1.5,
    },  # diagonal cells: intragroup edges
    clockwise_edge_kwargs={
        "color": "darkorange",
        "linewidth": 0.8,
    },  # off-diagonal cells: clockwise directed edges
    counterclockwise_edge_kwargs={
        "color": "darkgreen",
        "linewidth": 0.8,
    },  # off-diagonal cells: counterclockwise directed edges
    progress=False,
)
fig, axes = hpm_styled.plot()
plt.show()
../_images/notebooks_hpm_from_partition_39_0.png

For a full explanation of edge kwarg options and prioritization, see the Changing Edge Keyword Arguments page.

Uniform Node and Edge Rendering#

node_kwargs and all_edge_kwargs apply rendering options uniformly across every cell at construction time. Node and edge kwargs can also be passed to .plot() to override them at render time.

[23]:
hpm_uniform = HivePlotMatrix.from_partition(
    nodes=nodes,
    edges=edges,
    partition_variable="group",
    sorting_variables="value1",
    node_kwargs={"s": 40, "color": "steelblue"},
    all_edge_kwargs={"color": "salmon", "alpha": 0.5},
    progress=False,
)
fig, axes = hpm_uniform.plot()
plt.show()
../_images/notebooks_hpm_from_partition_42_0.png

For a full explanation of edge kwarg options and prioritization, see the Changing Edge Keyword Arguments page.

Plot Options#

The plot() method accepts several keyword arguments to control figure appearance. For example, we can change the figsize:

[24]:
# figsize: override the default auto-computed size
# default 4 per cell, so by default this example figsize is (16, 16)
# smaller makes lines thicker, nodes bigger, text larger
fig, axes = hpm.plot(figsize=(8, 8))
plt.show()
../_images/notebooks_hpm_from_partition_45_0.png

Or we could only change the row / column text label size:

[25]:
# label_fontsize: control the row/column header font size
fig, axes = hpm.plot(label_fontsize=12)
plt.show()
../_images/notebooks_hpm_from_partition_47_0.png

Turn Off Axes Labels#

Each off-diagonal cell’s three axes orient to match the grid layout:

  • Each axis pointing due East corresponds to the row group name.

  • Each axis pointing up corresponds to the column group name.

  • The “Other” axis containing all remaining nodes points toward the lower-left, away from both headers.

The per-cell axis labels make this explicit, but once the pattern is familiar, show_axes_labels=False gives a cleaner view:

[26]:
# show_axes_labels=False: hide per-cell axis labels for a cleaner large-matrix view
fig, axes = hpm.plot(show_axes_labels=False)
plt.show()
../_images/notebooks_hpm_from_partition_49_0.png

Visualization Back Ends#

Two visualization back ends are supported with HPMs: matplotlib (default) and datashader. The back end is set at construction time via the backend parameter.

[27]:
print("Current back end:", hpm._backend)
Current back end: matplotlib

Datashader Back End#

Datashader renders rasterized density images with shared colorbars across all cells. This requires that hiveplotlib be installed with the datashader dependencies via:

pip install hiveplotlib[datashader]

For more on constructing hive plots with datashader, see the Hive Plots for Large Networks and Datashader pages.

Note that while the matplotlib back end only returns the figure and axes, here the plot() call also returns the node / edge rasterizations.

[28]:
hpm_ds = HivePlotMatrix.from_partition(
    nodes=nodes,
    edges=edges,
    partition_variable="group",
    sorting_variables="value1",
    backend="datashader",
    progress=False,
)
# datashader plot also returns node / edge rasterizations
fig, axes, im_nodes, im_edges = hpm_ds.plot()
plt.show()
../_images/notebooks_hpm_from_partition_53_0.png

Changing Parameters for Diagonal with Datashader#

from_partition exposes separate pixel-spread parameters for diagonal vs. off-diagonal cells. Since diagonal cells zoom in on a 2-axis layout, they need a smaller pixel spread to have comparably-sized nodes and edges:

[29]:
fig, axes, im_nodes, im_edges = hpm_ds.plot(
    diagonal_pixel_spread_nodes=3,
    off_diagonal_pixel_spread_nodes=6,
)
plt.show()
../_images/notebooks_hpm_from_partition_55_0.png

Setting Explicit Density Cutoffs with Datashader#

The node and edge density colormaps and color range will be the same for all hive plots in the HPM.

By default, the max color range for each will top out at the maximum density value over all of the hive plots.

If preferred, users can set vmax_nodes and vmax_edges to fix the shared density max across all cells to a specific level. This can be useful when one cell is much denser than the others or if users have preferred, more-interpretable cutoffs.

[30]:
fig, axes, im_nodes, im_edges = hpm_ds.plot(vmax_nodes=20, vmax_edges=50)
plt.show()
../_images/notebooks_hpm_from_partition_57_0.png

Turn Off Density Colorbars with Datashader#

Users can turn off one or both node / edge colorbars that show up by default by setting show_node_colorbar / show_edge_colorbar to False (both default to True).

[31]:
fig, axes, im_nodes, im_edges = hpm_ds.plot(
    show_node_colorbar=False,
    show_edge_colorbar=False,
)
plt.show()
../_images/notebooks_hpm_from_partition_59_0.png

For a deeper dive into other Hive Plot Matrix convenience methods, see the HivePlotMatrix Gallery Examples.