Low-Level Hive Plot API#

class hiveplotlib.Node(unique_id: Hashable, data: Dict | None = None)#

Node instances hold the data for individual network node.

Each instance is initialized with a unique_id for identification. These IDs must be Hashable. One can also initialize with a dictionary of data, but data can also be added later with the add_data() method.

Example:

my_node = Node(unique_id="my_unique_node_id", data=my_dataset)

my_second_node = Node(unique_id="my_second_unique_node_id")
my_second_node.add_data(data=my_second_dataset)

add_data(data: Dict, overwrite_old_data: bool = False) → None#

Add dictionary of data to Node.data.

Parameters:

data – dict of data to associate with Node instance.
overwrite_old_data – whether to delete existing data dict and overwrite with data. Default False.

Returns:

None.

class hiveplotlib.Axis(axis_id: Hashable, start: float = 1, end: float = 5, angle: float = 0, long_name: Hashable | None = None, metadata: dict | None = None)#

Axis instance.

Axis instances are initialized based on their intended final position when plotted. Each Axis is also initialized with a unique, hashable axis_id for clarity when building hive plots with multiple axes.

The eventual size and positioning of the Axis instance is dictated in the context of polar coordinates by three parameters:

start dictates the distance from the origin to the beginning of the axis when eventually plotted.

stop dictates the distance from the origin to the end of the axis when eventually plotted.

angle sets the angle the Axis is rotated counterclockwise. For example, angle=0 points East, angle=90 points North, and angle=180 points West.

Node instances placed on each Axis instance will be scaled to fit onto the span of the Axis, but this is discussed further in the HivePlot class, which handles this placement.

Since axis_id values may be shorthand for easy referencing when typing code, if one desires a formal name to plot against each axis when visualizing, one can provide a separate long_name that will show up as the axis label when running hiveplotlib.viz code. (For example, one may choose axis_id="a1" and long_name="Axis 1".

Note

long_name defaults to axis_id if not specified.

Example:

# 3 axes, spaced out 120 degrees apart, all size 4, starting 1 unit off of origin
axis0 = Axis(axis_id="a0", start=1, end=5, angle=0, long_name="Axis 0")
axis1 = Axis(axis_id="a1", start=1, end=5, angle=120, long_name="Axis 1")
axis2 = Axis(axis_id="a2", start=1, end=5, angle=240, long_name="Axis 2")

add_metadata(metadata: dict) → None#

Add metadata to the axis.

This method will overwrite existing metadata with the same keys.

Parameters:: metadata – dictionary of metadata to add to the axis.
Returns:: None.

set_node_placements(placements_df: DataFrame, unique_id: Hashable) → None#

Set Axis.node_placements to a pandas.DataFrame of node placement information with node metadata.

Dataframe consists of x cartesian coordinates, y cartesian coordinates, unique node IDs, and polar rho values (e.g. distance from the origin).

Note

This is an internal setter method to be called downstream by the HivePlot.place_nodes_on_axis() method.

Parameters:

placements_df – dataframe of placement information and other node metadata.
unique_id – column corresponding to node unique IDs.

Returns:

None.

set_node_vmin_and_vmax(vmin: float, vmax: float, inferred_vmin: bool, inferred_vmax: bool) → None#

Set the vmin and vmax values used to place nodes on the axis.

Note

This is an internal setter method to be called downstream by the HivePlot.place_nodes_on_axis() method.

Parameters:

vmin – all node scalar values less than vmin would have been set to vmin
vmax – all node scalar values greater than vmax would have been set to vmax.
inferred_vmin – whether vmin value was inferred in HivePlot.place_nodes_on_axis().
inferred_vmax – whether vmax value was inferred in HivePlot.place_nodes_on_axis().

Returns:

None.

set_sorting_variable(label: Hashable) → None#

Set which scalar variable in each Node instance will be used to place each node on the axis when plotting.

Note

This is an internal setter method to be called downstream by the HivePlot.place_nodes_on_axis() method.

Parameters:: label – which scalar variable in the node data to reference.
Returns:: None.

class hiveplotlib.BaseHivePlot(use_numba: bool = True, n_parallel: int | None = None)#

Hive Plots built from combination of Axis and Node instances.

This class is essentially methods for creating and maintaining the nested dictionary attribute edges, which holds constructed Bézier curves, edge ids, and matplotlib keyword arguments for various sets of edges to be plotted. The nested dictionary structure can be abstracted to the below example.

BaseHivePlot.hive_plot_edges["starting axis"]["ending axis"]["tag"]

The resulting dictionary value holds the edge information relating to an addition of edges that are tagged as “tag,” specifically the edges going FROM the axis named “starting axis” TO the axis named “ending axis.” This value is in fact another dictionary, meant to hold the discretized Bézier curves (curves), the matplotlib keyword arguments for plotting (edge_kwargs), and the abstracted edge ids (an (m, 2) np.ndarray) between which we are drawing Bézier curves (ids).

add_axes(axes: Axis | List[Axis]) → None#

Add list of Axis instances to axes attribute.

Note

All resulting Axis IDs must be unique.

Parameters:: axes – Axis object(s) to add to axes attribute.
Returns:: None.

add_edge_curves_between_axes(axis_id_1: Hashable, axis_id_2: Hashable, tag: Hashable | None = None, a1_to_a2: bool = True, a2_to_a1: bool = True, num_steps: int = 100, short_arc: bool = True, control_rho_scale: float = 1, control_angle_shift: float = 0, use_numba_curves: bool | None = None) → None#

Construct discretized edge curves between two axes of a Hive Plot.

Note

One must run the add_edge_ids() method first for the two axes of interest.

Resulting discretized Bézier curves will be stored as an (n, 2) numpy.ndarray of multiple sampled curves where the first column is x position and the second column is y position in Cartesian coordinates.

Note

Although each curve is represented by a (num_steps, 2) array, all the curves are stored curves in a single collective numpy.ndarray separated by rows of [np.nan, np.nan] between each discretized curve. This allows matplotlib to accept a single array when plotting lines via plt.plot(), which speeds up plotting later.

This output will be stored in hive_plot_edges[axis_id_1][axis_id_2][tag]["curves"].

Parameters:

axis_id_1 – pointer to first of two Axis instances in the axes attribute between which we want to find connections.
axis_id_2 – pointer to second of two Axis instances in the axes attribute between which we want to find connections.
tag – unique ID specifying which subset of edges specified by their IDs to construct (e.g. hive_plot_edges[axis_id_1][axis_id_2][tag]["ids"]). Note, if no tag is specified (e.g. tag=None), it is presumed there is only one tag for the specified set of axes to look over, which can be inferred. If no tag is specified and there are multiple tags to choose from, a ValueError will be raised.
a1_to_a2 – whether to build out the edges going FROM axis_id_1 TO axis_id_2.
a2_to_a1 – whether to build out the edges going FROM axis_id_2 TO axis_id_1.
num_steps – number of points sampled along a given Bézier curve. Larger numbers will result in smoother curves when plotting later, but slower rendering.
short_arc – whether to take the shorter angle arc (True) or longer angle arc (False). There are always two ways to traverse between axes: with one angle being x, the other option being 360 - x. For most visualizations, the user should expect to traverse the “short arc,” hence the default True. For full user flexibility, however, we offer the ability to force the arc the other direction, the “long arc” (short_arc=False). Note: in the case of 2 axes 180 degrees apart, there is no “wrong” angle, so in this case an initial decision will be made, but switching this boolean will switch the arc to the other hemisphere.
control_rho_scale – how much to multiply the distance of the control point for each edge to / from the origin. Default 1 sets the control rho for each edge as the mean rho value for each pair of nodes being connected by that edge. A value greater than 1 will pull the resulting edges further away from the origin, making edges more convex, while a value between 0 and 1 will pull the resulting edges closer to the origin, making edges more concave. Note, this affects edges further from the origin by larger magnitudes than edges closer to the origin.
control_angle_shift – how far to rotate the control point for each edge around the origin. Default 0 sets the control angle for each edge as the mean angle for each pair of nodes being connected by that edge. A positive value will pull the resulting edges further counterclockwise, while a negative value will pull the resulting edges further clockwise.
use_numba_curves – whether to use a numba-accelerated sampler to construct curves. If None, resolves to the class-level default set in __init__. When enabled and numba is available, a parallel implementation is used. A small-case heuristic may bypass numba when the total sampled points are below the automatic selection policy between serial and parallel numba.

Returns:

None.

add_edge_ids(edges: Edges | ndarray, axis_id_1: Hashable, axis_id_2: Hashable, tag: Hashable | None = None, a1_to_a2: bool = True, a2_to_a1: bool = True) → Hashable#

Find and store the edge IDs relevant to the specified pair of axes.

Find the subset of network connections that involve nodes on axis_id_1 and axis_id_2. looking over the specified edges compared to the IDs of the Node instances currently placed on each Axis. Edges discovered between the specified two axes (depending on the values specified by a1_to_a2 and a2_to_a1, more below) will have the relevant edge IDs stored, with other edges disregarded.

Generates (j, 2) and (k, 2) numpy arrays of axis_id_1 to axis_id_2 connections and axis_id_2 to axis_id_1 connections (or only 1 of those arrays depending on parameter choices for a1_to_a2 and a2_to_a1).

The resulting arrays of relevant edge IDs (e.g. each row is a [<FROM ID>, <TO ID>] edge) will be stored automatically in the hive_plot_edges attribute, a dictionary of dictionaries of dictionaries of edge information, which can later be converted into discretized edges to be plotted in Cartesian space. They are stored as hive_plot_edges[<source_axis_id>][<sink_axis_id>][<tag>]["ids"].

Note

If no tag is provided (e.g. default None), one will be automatically generated and returned by this method call.

Parameters:

edges – Edges instance or (n, 2) array of Hashable values representing unique IDs of specific Node instances. The first column is the IDs for the “from” nodes and the second column is the IDS for the “to” nodes for each connection.
axis_id_1 – pointer to first of two Axis instances in the axes attribute between which we want to find connections.
axis_id_2 – pointer to second of two Axis instances in the axes attribute between which we want to find connections.
tag – tag corresponding to subset of specified edges. If None is provided, the tag will be set as the lowest unused integer starting at 0 amongst the available tags under hive_plot_edges[axis_id_1][axis_id_2] and / or hive_plot_edges[axis_id_2][axis_id_1].
a1_to_a2 – whether to find the connections going FROM axis_id_1 TO axis_id_2.
a2_to_a1 – whether to find the connections going FROM axis_id_2 TO axis_id_1.

Returns:

the resulting unique tag. Note, if both a1_to_a2 and a2_to_a1 are True the resulting unique tag returned will be the same for both directions of edges.

add_edge_kwargs(axis_id_1: Hashable, axis_id_2: Hashable, tag: Hashable | None = None, a1_to_a2: bool = True, a2_to_a1: bool = True, reset_existing_kwargs: bool = False, overwrite_existing_kwargs: bool = True, warn_on_no_edges: bool = True, **edge_kwargs) → None#

Add edge kwargs to the constructed hive_plot_edges attribute between two axes of a Hive Plot.

For a given set of edges for which edge kwargs were already set, any redundant edge kwargs specified by this method call will overwrite the previously set kwargs.

Expected to have found edge IDs between the two axes before calling this method, which can be done either by calling the connect_axes() method or the lower-level add_edge_ids() method for the two axes of interest. A warning will be raised if no edges exist between the two axes and warn_on_no_edges=True.

Resulting kwargs will be stored as a dict. This output will be stored in hive_plot_edges[axis_id_1][axis_id_2][tag]["edge_kwargs"].

Note

There is special handling in here for when the two provided axes have names "<axis_name>" and "<axis_name>_repeat". This is for use with HivePlot constructed with repeat_axes=True, which always names the repeated axis "<axis_name>_repeat". By definition, the edges between an axis and its repeat are the same, and therefore edges between these two axes should only be plotted in one direction, so a warning of a lack of edges in both directions for repeat edges is not productive and we formally catch this case.

Parameters:

axis_id_1 – Hashable pointer to the first Axis instance in the axes attribute to which we want to add plotting kwargs.
axis_id_2 – Hashable pointer to the second Axis instance in the axes attribute to which we want to add plotting kwargs.
tag – which subset of curves to modify kwargs for. Note, if no tag is specified (e.g. tag=None), it is presumed there is only one tag for the specified set of axes to look over and that will be inferred. If no tag is specified and there are multiple tags to choose from, a ValueError will be raised.
a1_to_a2 – whether to add kwargs for connections going FROM axis_id_1 TO axis_id_2.
a2_to_a1 – whether to add kwargs for connections going FROM axis_id_2 TO axis_id_1.
reset_existing_kwargs – whether to remove all existing edge kwargs before adding provided edge_kwargs for the edges specified by other parameters, default False leaves existing edge kwargs unchanged.
overwrite_existing_kwargs – whether to overwrite existing edge kwargs if provided again, default True overwrites already-provided edge kwargs with the new value(s) in edge_kwargs.
warn_on_no_edges – whether to warn if adding kwargs for edges that don’t exist. Default True.
edge_kwargs – additional matplotlib keyword arguments that will be applied to the specified edges.

Returns:

None.

add_edges(edges: Edges | ndarray) → None#

Add edges to edges attribute.

Parameters:: edges – Edges instance or 2d array of [from, to] edges, where values correspond to unique node IDs.
Returns:: None.

add_nodes(nodes: NodeCollection | List[Node], check_uniqueness: bool = True) → None#

Add NodeCollection or Node instances to nodes attribute.

Parameters:

nodes – NodeCollection instance or list of Node instances, will be added to nodes attribute.
check_uniqueness – whether to formally check for uniqueness. WARNING: the only reason to turn this off is if the dataset becomes big enough that this operation becomes expensive, and you have already established uniqueness another way (for example, you are pulling data from a database and the key in your table is the unique ID). If you add non-unique IDs with check_uniqueness=False, we make no promises about output.

Returns:

None.

connect_axes(edges: Edges | ndarray, axis_id_1: Hashable, axis_id_2: Hashable, tag: Hashable | None = None, a1_to_a2: bool = True, a2_to_a1: bool = True, num_steps: int = 100, short_arc: bool = True, control_rho_scale: float = 1, control_angle_shift: float = 0, reset_existing_kwargs: bool = False, overwrite_existing_kwargs: bool = True, warn_on_no_edges: bool = True, **edge_kwargs) → Hashable#

Construct all the curves and set all the curve kwargs between axis_id_1 and axis_id_2.

Based on the specified edges parameter, build out the resulting Bézier curves, and set any kwargs for those edges for later visualization.

The curves will be tracked by a unique tag, and the resulting constructions will be stored in hive_plot_edges[axis_id_1][axis_id_2][tag] if a1_to_a2 is True and hive_plot_edges[axis_id_2][axis_id_1][tag] if a2_to_a1 is True.

Note

If trying to draw different subsets of edges with different kwargs, one can run this method multiple times with different subsets of the entire edges array, providing unique tag values with each subset of edges, and specifying different edge_kwargs each time. The resulting Hive Plot would be plotted showing each set of edges styled with each set of unique kwargs.

Note

You can choose to construct edges in only one of either directions by specifying a1_to_a2 or a2_to_a1 as False (both are True by default).

Parameters:

edges – hiveplotlib.Edges instance or (n, 2) array of Hashable values representing pointers to specific Node instances. If providing an array input, the first column is the “from” and the second column is the “to” for each connection.
axis_id_1 – Hashable pointer to the first Axis instance in the axes attribute we want to find connections between.
axis_id_2 – Hashable pointer to the second Axis instance in the axes attribute we want to find connections between.
tag – tag corresponding to specified edges. If None is provided, the tag will be set as the lowest unused integer starting at 0 amongst the available tags under hive_plot_edges[from_axis_id][to_axis_id] and / or hive_plot_edges[to_axis_id][from_axis_id].
a1_to_a2 – whether to find and build the connections going FROM axis_id_1 TO axis_id_2.
a2_to_a1 – whether to find and build the connections going FROM axis_id_2 TO axis_id_1.
num_steps – number of points sampled along a given Bézier curve. Larger numbers will result in smoother curves when plotting later, but slower rendering.
short_arc – whether to take the shorter angle arc (True) or longer angle arc (False). There are always two ways to traverse between axes: with one angle being x, the other option being 360 - x. For most visualizations, the user should expect to traverse the “short arc,” hence the default True. For full user flexibility, however, we offer the ability to force the arc the other direction, the “long arc” (short_arc=False). Note: in the case of 2 axes 180 degrees apart, there is no “wrong” angle, so in this case an initial decision will be made, but switching this boolean will switch the arc to the other hemisphere.
control_rho_scale – how much to multiply the distance of the control point for each edge to / from the origin. Default 1 sets the control rho for each edge as the mean rho value for each pair of nodes being connected by that edge. A value greater than 1 will pull the resulting edges further away from the origin, making edges more convex, while a value between 0 and 1 will pull the resulting edges closer to the origin, making edges more concave. Note, this affects edges further from the origin by larger magnitudes than edges closer to the origin.
control_angle_shift – how far to rotate the control point for each edge around the origin. Default 0 sets the control angle for each edge as the mean angle for each pair of nodes being connected by that edge. A positive value will pull the resulting edges further counterclockwise, while a negative value will pull the resulting edges further clockwise.
edge_kwargs – additional matplotlib params that will be applied to the related edges.
reset_existing_kwargs – whether to remove all existing edge kwargs before adding provided edge_kwargs for the edges specified by other parameters, default False leaves existing edge kwargs unchanged.
overwrite_existing_kwargs – whether to overwrite existing edge kwargs if provided again, default True overwrites already-provided edge kwargs with the new value(s) in edge_kwargs.
warn_on_no_edges – whether to warn if adding kwargs for edges that don’t exist. Default True.

Returns:

Hashable tag that identifies the generated curves and kwargs.

construct_curves(num_steps: int = 100, short_arc: bool = True, control_rho_scale: float = 1, control_angle_shift: float = 0, use_numba_curves: bool | None = None) → None#

Construct Bézier curves for any connections for which we’ve specified the edges to draw.

(e.g. hive_plot_edges[axis_0][axis_1][<tag>]["ids"] is non-empty but hive_plot_edges[axis_0][axis_1][<tag>]["curves"] does not yet exist).

Note

Checks all <tag> values between axes.

Parameters:

num_steps – number of points sampled along a given Bézier curve. Larger numbers will result in smoother curves when plotting later, but slower rendering.
short_arc – whether to take the shorter angle arc (True) or longer angle arc (False). There are always two ways to traverse between axes: with one angle being x, the other option being 360 - x. For most visualizations, the user should expect to traverse the “short arc,” hence the default True. For full user flexibility, however, we offer the ability to force the arc the other direction, the “long arc” (short_arc=False). Note: in the case of 2 axes 180 degrees apart, there is no “wrong” angle, so in this case an initial decision will be made, but switching this boolean will switch the arc to the other hemisphere.
control_rho_scale – how much to multiply the distance of the control point for each edge to / from the origin. Default 1 sets the control rho for each edge as the mean rho value for each pair of nodes being connected by that edge. A value greater than 1 will pull the resulting edges further away from the origin, making edges more convex, while a value between 0 and 1 will pull the resulting edges closer to the origin, making edges more concave. Note, this affects edges further from the origin by larger magnitudes than edges closer to the origin.
control_angle_shift – how far to rotate the control point for each edge around the origin. Default 0 sets the control angle for each edge as the mean angle for each pair of nodes being connected by that edge. A positive value will pull the resulting edges further counterclockwise, while a negative value will pull the resulting edges further clockwise.
use_numba_curves – whether to use a numba-accelerated sampler to construct curves. If None, resolves to the class-level default set in __init__. When enabled and numba is available, a parallel implementation is used. A small-case heuristic may bypass numba when the total sampled points are below the automatic selection policy between serial and parallel numba.

Returns:

None.

copy()#

Return a copy of the instance.

Returns:: copy of the instance.

place_nodes_on_axis(axis_id: Hashable, node_df: DataFrame | None = None, sorting_feature_to_use: Hashable | None = None, vmin: float | None = None, vmax: float | None = None, unique_ids: None = None) → None#

Set node positions on specific Axis.

Cartesian coordinates will be normalized to specified vmin and vmax. Those vmin and vmax values will then be normalized to span the length of the axis when plotted.

Note

unique_ids was removed as a parameter in version 0.26.0. Node data must now be provided as a pandas.DataFrame via the node_df parameter.

Parameters:

axis_id – which axis (as specified by the keys from the axes attribute) for which to plot nodes.
node_df – dataframe of node information to assign to this axis. If previously set with BaseHivePlot._allocate_nodes_to_axis(), this will overwrite those node assignments. If None, method will check and confirm there are existing node ID assignments.
sorting_feature_to_use – which feature in the node data to use to align nodes on an axis. Default None uses the feature previously assigned via BaseHivePlot.axes[axis_id].set_sorting_variable().
vmin – all values less than vmin will be set to vmin. Default None sets as global minimum of feature values for all Node instances on specified Axis.
vmax – all values greater than vmax will be set to vmax. Default None sets as global maximum of feature values for all Node instances on specified Axis.
unique_ids – REMOVED IN VERSION 0.26.0. See note above.

Raises:

TypeError – if no-longer supported unique_ids parameter used.

Returns:

None.

reset_edges(axis_id_1: Hashable | None = None, axis_id_2: Hashable | None = None, tag: Hashable | None = None, a1_to_a2: bool = True, a2_to_a1: bool = True) → None#

Reset hive_plot_edges attribute and corresponding edges.relevant_edges (if edges exists).

Setting all the parameters to None deletes any stored connections between axes previously computed. If any subset of the parameters is not None, the resulting edges will be deleted:

If axis_id_1, axis_id_2, and tag are all specified as not None, the implied single subset of edges will be deleted. (Note, tags are required to be unique within a specified (axis_id_1, axis_id_2) pair.) In this case, the default is to delete all the edges bidirectionally (e.g. going axis_id_1 -> axis_id_2 and axis_id_2 -> axis_id_1) with the specified tag. To only delete edges in one of these directions, see the description of the bool parameters a1_to_a2 and a2_to_a1 below.

If only axis_id_1 and axis_id_2 are provided as not None, then the default is to delete all edge subsets bidirectionally between axis_id_1 to axis_id_2 (e.g. going axis_id_1 -> axis_id_2 and axis_id_2 -> axis_id_1) with the specified tag. To only delete edges in one of these directions, see the description of the bool parameters a1_to_a2 and a2_to_a1 below.

If only axis_id_1 is provided as not None, then all edges going TO and FROM axis_id_1 will be deleted. To only delete edges in one of these directions, see the description of the bool parameters a1_to_a2 and a2_to_a1 below.

Parameters:

axis_id_1 – specifies edges all coming FROM the axis identified by this unique ID.
axis_id_2 – specifies edges all coming TO the axis identified by this unique ID.
tag – tag corresponding to explicit subset of added edges.
a1_to_a2 – whether to remove the connections going FROM axis_id_1 TO axis_id_2. Note, if axis_id_1 is specified by axis_id_2 is None, then this dictates whether to remove all edges going from axis_id_1.
a2_to_a1 – whether to remove the connections going FROM axis_id_2 TO axis_id_1. Note, if axis_id_1 is specified by axis_id_2 is None, then this dictates whether to remove all edges going to axis_id_1.

Returns:

None.

to_json() → str#

Return the information from the axes, nodes, and edges in Cartesian space as a serialized JSON string.

This allows users to visualize hive plots with arbitrary libraries, even outside of python.

The dictionary structure of the resulting JSON will consist of two top-level keys:

“axes” - contains the information for plotting each axis (including angle and long_name), plus the nodes on each axis in Cartesian space.

“edges” - contains the information for plotting the discretized edges in Cartesian space, plus the corresponding to and from IDs that go with each edge, as well as any kwargs that were set for plotting each set of edges.

Returns:: JSON output of axis, node, and edge information.

to_networkx(*, directed: bool = True, multigraph: bool = True, source_attribute_name: str | None = None) → nx.Graph#

Convert this HivePlot’s nodes and edges to a networkx graph.

Thin wrapper around hiveplotlib.converters.nodes_edges_to_networkx().

Parameters:

directed – whether to return a networkx.DiGraph-family graph. Default True.
multigraph – whether to return a Multi*-family graph. Default True.
source_attribute_name – optional edge-attribute name under which to store a (tag, row_index) source-row identifier for every edge. Default None keeps the export clean. Pass a string (typically "_hiveplotlib_source") to enable this annotation. This is required if you intend to feed the resulting graphback into hiveplotlib.graph_features.compute_graph_metrics() to compute edge metrics on a multigraph, since the annotation is what lets each per-edge metric value map back to its specific source row in the original hiveplotlib.Edges.

Returns:

networkx graph (one of networkx.Graph, networkx.DiGraph, networkx.MultiGraph, or networkx.MultiDiGraph).