Visualizing Edge Metadata#

This notebook demonstrates how to include additional columns of edge metadata in a single hive plot visualization.

[1]:
import matplotlib.pyplot as plt
from hiveplotlib.datasets import example_hive_plot
from matplotlib.cm import ScalarMappable
from matplotlib.colors import Normalize

The edges attribute of a hiveplotlib.HivePlot instance holds a DataFrame of edge data. This DataFrame will be used as a basis for generating a hive plot, connecting nodes specified by the DataFrame column Edges.from_column_name (from in the example below) to the nodes specified by the DataFrame column Edges.to_column_name (to in the example below).

Once we’ve settled the edges to draw though, what if we have additional columns of data in our edge DataFrame? How can we see how these additional variables relate to the interactions in our current hive plot?

Hive plots only dictate the edges to be drawn, not how we draw them. This gives us the flexibility to represent additional edge data in the same hive plot by plotting each edge while varying dimensions like color and line width.

We will use the following example to demonstrate flexibility in plotting edges:

[2]:
hp = example_hive_plot(repeat_axes=True)
[3]:
hp.edges.data
[3]:
from to low med high
0 85 63 6.567506 15.711859 24.629851
1 51 26 6.176714 16.259557 24.106476
2 30 4 7.501076 14.257005 26.436396
3 7 1 4.991420 15.811134 21.297566
4 17 81 5.189225 11.564580 26.122606
... ... ... ... ... ...
95 82 95 7.425179 13.559648 27.493189
96 36 14 6.071377 17.872480 21.423589
97 51 97 6.053931 15.682788 26.290276
98 36 88 3.000836 16.890711 23.835759
99 38 82 7.309462 15.496119 27.006405

100 rows × 5 columns

Note that here we have three edge variables to work with: low, med, and high.

These edge variables were computed as an average of the correspondingly-named low, med, and high node metadata values for each pair of from and to nodes.

(For more on the node metadata values in this example, see the Visualizing Node Metadata notebook.)

By default, example_hive_plot() is partitioning and sorting nodes based on the low node metadata values, but is not plotting any of these edge metadata variables.

How to Plot Edge Metadata#

To plot a column of edge metadata in a hive plot, we need only assign that edge column name to the plotting keyword argument.

When plotting edges, the Hiveplotlib edge visualization function (for all visualization back ends) first checks the assigned kwarg values against the edge columns. If there’s a match, then the edge data is used for that kwarg.

Edge Color#

If we color the edges by their low values, this should reveal two relationships:

  1. We should see lower-level colors for A edges relative to B edges and lower values for B edges relative to C edges. For example, the average of an A to A edge will almost always be lower than a C to C edge. This comes from the partition using the low node metadata values.

  2. Since we’re still sorting nodes by the node variable low, edges with lower-level colors should be closer to the center of the hive plot relative to edges with higher-level colors.

[4]:
# point edge color of edges to edge data column name
edge_kwargs = {
    "array": "low",  # 'array' is name of color param for underlying matplotlib `LineCollection`
    "cmap": "cividis",
    "clim": (0, 10),  # keep the min and max color the same for all edges
}

fig, ax = hp.plot(**edge_kwargs)

# add custom colorbar to plot
fig.colorbar(
    ScalarMappable(
        norm=Normalize(*edge_kwargs["clim"]),
        cmap=edge_kwargs["cmap"],
    ),
    ax=ax,
    shrink=0.7,
    label="Edge Variable 'low'",
)

plt.show()
../_images/notebooks_visualizing_edge_metadata_8_0.png

But if we instead choose an uncorrelated variable for the color (e.g. med), then these patterns will disappear.

[5]:
fig, ax = plt.subplots()
ax.scatter(hp.edges.data.med, hp.edges.data.low)
ax.set_xlabel("Edge 'med' Value")
ax.set_ylabel("Edge 'low' Value")
ax.set_title(
    "No relationship between edge 'med' value and edge 'low' value",
)
plt.show()
../_images/notebooks_visualizing_edge_metadata_10_0.png
[6]:
# swith to using "med"
# make color range higher because med values are higher
edge_kwargs = {
    "array": "med",
    "cmap": "cividis",
    "clim": (10, 20),  # keep the min and max color the same for all edges
}

fig, ax = hp.plot(**edge_kwargs)

# add custom colorbar to plot
fig.colorbar(
    ScalarMappable(
        norm=Normalize(*edge_kwargs["clim"]),
        cmap=edge_kwargs["cmap"],
    ),
    ax=ax,
    shrink=0.7,
    label="Edge Variable 'med'",
)

ax.set_title(
    "Uncorrelated 'med' edge color has no patterns",
    size=16,
    y=1.05,
)

plt.show()
../_images/notebooks_visualizing_edge_metadata_11_0.png

Edge Line Width#

We can also use edge line width to represent edge metadata.

[7]:
edge_kwargs = {
    "linewidth": "low",  # assign linewidth kwarg to edge column name
}

fig, ax = hp.plot(**edge_kwargs)

plt.show()
../_images/notebooks_visualizing_edge_metadata_13_0.png

These line widths get a bit too large, which raises the question, how can we manipulate our metadata values to improve our metadata visualization?

Scaling Edge Metadata#

Users can create scaled data as needed for plotting through standard DataFrame manipulation (i.e. adding a new column to HivePlot.edges.data.

Users can scale the new data column however they seet fit, linearly or nonlinearly (e.g. log), to create new column(s) of scaled metadata to reference in their plotting.

[8]:
# edges got too big, let's scale down 1/3
hp.edges.data["linewidth_third"] = hp.edges.data["low"] / 3

hp.edges.data
[8]:
from to low med high linewidth_third
0 85 63 6.567506 15.711859 24.629851 2.189169
1 51 26 6.176714 16.259557 24.106476 2.058905
2 30 4 7.501076 14.257005 26.436396 2.500359
3 7 1 4.991420 15.811134 21.297566 1.663807
4 17 81 5.189225 11.564580 26.122606 1.729742
... ... ... ... ... ... ...
95 82 95 7.425179 13.559648 27.493189 2.475060
96 36 14 6.071377 17.872480 21.423589 2.023792
97 51 97 6.053931 15.682788 26.290276 2.017977
98 36 88 3.000836 16.890711 23.835759 1.000279
99 38 82 7.309462 15.496119 27.006405 2.436487

100 rows × 6 columns

[9]:
edge_kwargs = {
    "linewidth": "linewidth_third",  # use 1/3 edge metadata for size
}

fig, ax = hp.plot(**edge_kwargs)

plt.show()
../_images/notebooks_visualizing_edge_metadata_17_0.png

Multiple Edge Features#

Of course, nothing is stopping us from assigning edge metadata to multiple keyword arguments. Below, we use color and line width as used in the above examples, but in a single hive plot.

[10]:
edge_kwargs = {
    "array": "low",  # edge metadata for color
    "linewidth": "linewidth_third",  # AND edge metadata for size
    "cmap": "cividis",
    "clim": (0, 10),
}

fig, ax = hp.plot(**edge_kwargs)

plt.show()
../_images/notebooks_visualizing_edge_metadata_19_0.png

Persisting Edge Kwargs for Future Plots#

If you settle on a nice use of edge metadata that you want to persist without including in every plot() call, you can instead call HivePlot.update_edge_plotting_keyword_arguments(), passing along the same edge plotting kwargs.

[11]:
hp_persist_kwargs = hp.copy()

edge_kwargs = {
    "array": "low",
    "linewidth": "linewidth_third",
    "cmap": "cividis",
    "clim": (0, 10),
}

# this will persist for future plots
hp_persist_kwargs.update_edge_plotting_keyword_arguments(**edge_kwargs)

fig, ax = hp_persist_kwargs.plot()  # no edge kwargs needed here

plt.show()
../_images/notebooks_visualizing_edge_metadata_21_0.png

Now we can instead focus on other plotting flexibility like node kwargs.

[12]:
fig, ax = hp_persist_kwargs.plot(node_kwargs={"color": "royalblue"})

ax.set_title("Edge Kwargs Persisted", y=1.05, size=16)
plt.show()
../_images/notebooks_visualizing_edge_metadata_23_0.png

There is similar functionality available when plotting node metadata. For more information, see the Visualizing Node Metadata page.

Persisting Edge Metadata to a Different Edge Hierarchy Variable#

Hiveplotlib maintains a hierarchy of options for changing edge kwargs. For more on this broader flexibility when modifying edges, see the Changing Edge Keyword Arguments page.

When we called HivePlot.update_edge_plotting_keyword_arguments() above, by default this changes the all_edge_kwargs parameter, but we can also change a different parameter, for example, the repeat_edge_kwargs parameter:

[13]:
hp_persist_kwargs = hp.copy()

edge_kwargs = {
    "array": "low",
    "linewidth": "linewidth_third",
    "cmap": "Reds",
    "clim": (0, 10),
}

# change just repeat edge kwargs
hp_persist_kwargs.update_edge_plotting_keyword_arguments(
    edge_kwarg_setting="repeat_edge_kwargs",
    **edge_kwargs,
)

fig, ax = hp_persist_kwargs.plot()  # no edge kwargs needed here

plt.show()
../_images/notebooks_visualizing_edge_metadata_26_0.png