High-Level Hive Plot API#

class hiveplotlib.NodeCollection(data: DataFrame, unique_id_column: Hashable | None = None, node_viz_kwargs: dict | None = None, check_uniqueness: bool = True)#

Multi-node data aggregator and partitioner for downstream hive plots.

Ingests an input pandas.DataFrame (data) with specification for which data column correponds to the nodes’ unique IDs (unique_id_column).

Users can provide node-plotting keyword arguments via the node_viz_kwargs parameter in two ways.

1. By providing a string value corresponding to a column name, in which case that column data would be used for that plotting keyword argument in a node_viz() call.

2. By providing explicit keyword arguments (e.g. cmap="viridis"), in which case that keyword argument would be used as-is in a node_viz() call.

node_kwargs can also be updated (or overwritten) after instantiation via the update_node_viz_kwargs() method.

Note

Provided keyword argument values will be checked first against column names in NodeCollection.data (i.e. (1) above) before falling back to (2) and setting the keyword argument explicitly.

The appropriate keyword argument names should be chosen as a function of your choice of visualization back end (e.g. matplotlib, bokeh, datashader, etc.).

Parameters:
  • data – dataframe of node data.

  • unique_id_column – which column of data to use for each node’s unique ID. Default None creates one using the dataframe’s index values.

  • node_viz_kwargs – keyword arguments to provide to a node_viz() call. Users can provide names according to column names in the data attribute or explicit values, as discussed in (1) and (2) above.

  • check_uniqueness – whether to check the unique_id_column of data for uniqueness. This is always good to check, but users may wish to skip if working with large datasets that have already checked this column for uniqueness, for example, if using data from a SQL database with the primary key column.

Raises:

RepeatUniqueNodeIDsError – if data contains non-unique node IDs in the unique_id_column (and check_uniqueness=True).

check_unique_ids() bool#

Check that the unique_id_column of the data attribute contains unique values.

Returns:

True if all values in the unique_id_column are unique, False otherwise.

copy() NodeCollection#

Create a deep copy of the NodeCollection instance.

Returns:

deep copy of the NodeCollection instance.

create_partition_variable(data_column: Hashable, cutoffs: list[float] | ndarray | float | int = 3, labels: List[Hashable] | None = None, partition_variable_name: Hashable | None = None) Hashable#

Create a column in the data attribute partitioning the data with respect to a single column variable.

By default, splits will partition nodes by unique values of data_column.

If data_column corresponds to numerical data, and a list of cutoffs is provided, node IDs will be separated into bins according to the following binning scheme:

(-inf, cutoff[0]], (cutoff[0], cutoff[1]], … , (cutoff[-1], inf]

If data_column corresponds to numerical data, and cutoffs is provided as an int, node IDs will be separated into cutoffs equal-sized quantiles.

Note

This method currently only supports splits where data_column corresponds to numerical data.

Parameters:
  • data_column – which column of data in the underlying data attribute to use to partition the data.

  • cutoffs – cutoffs to use in binning nodes according to data under data_column. Default 3 will bin nodes into 3 equally-sized bins based on the unique values of data_column. When provided as an int, the exact numerical break points will be determined to create cutoffs equally-sized quantiles. When provided as a list / array of values, the specified cutoffs will bin according to (-inf, cutoffs[0]], (cutoffs[0], cutoffs[1]], … , (cutoffs[-1], inf).

  • labels – labels assigned to each bin. Only referenced when cutoffs is not None. Default None labels each bin as a string based on its range of values. Note, when cutoffs is a list, len(labels) must be 1 greater than len(cutoffs). When cutoffs is an int, len(labels) must be equal to cutoffs.

  • partition_variable_name – name of the resulting partition variable to add to the data attribute. Default None creates names starting at "partition_0", incrementing the integer to keep names unique if the user creates multiple partitions.

Returns:

column name of partition information added to the data attribute.

Raises:
update_node_viz_kwargs(reset_kwargs: bool = False, **node_viz_kwargs) None#

Update keyword arguments for plotting nodes in a node_viz() call.

Users can either provide values in two ways.

1. By providing a string value corresponding to a column name, in which case that column data would be used for that plotting keyword argument in a node_viz() call.

2. By providing explicit keyword arguments (e.g. cmap="viridis"), in which case that keyword argument would be used as-is in a node_viz() call.

Note

Provided keyword argument values will be checked first against column names in NodeCollection.data (i.e. (1) above) before falling back to (2) and setting the keyword argument explicitly.

The appropriate keyword argument names should be chosen as a function of your choice of visualization back end (e.g. matplotlib, bokeh, datashader, etc.).

Parameters:
  • reset_kwargs – whether to drop the existing keyword arguments before adding the provided keyword arguments to the node_viz_kwargs attribute. Existing values are preserved by default (i.e. reset_kwargs=False).

  • node_viz_kwargs – keyword arguments to provide to a node_viz() call. Users can provide names according to column names in the data attribute or explicit values, as discussed in (1) and (2) above.

Returns:

None.

class hiveplotlib.Edges(data: DataFrame | ndarray | dict[Hashable, ndarray | DataFrame], from_column_name: Hashable = 'from', to_column_name: Hashable = 'to', edge_viz_kwargs: dict | None = None)#

Multi-edge aggregator with helper methods useful for downstream hive plots.

An edge is specificed with respect to its starting node unique ID and ending node unique ID.

The Edge class ingests an input pandas.DataFrame or (n, 2) numpy.ndarray (data) with specification for which data columns correspond to the starting node IDs (from_column_name) and ending node IDs (to_column_name).

Users can also provide a dictionary of dataframes or arrays, where each key corresponds to a unique identifier for that set of edges. This allows users to store multiple sets of edges in a single Edges instance.

By providing a pandas.DataFrame input, additional edge metadata can be provided for later use (e.g. subsetting edges by metadata, keyword arguments for plotting edges with different thickness / color, etc.).

Users can thus visualize groups of edges in different ways in a single hive plot by providing a dictionary of dataframes with different edge metadata. Alternatively, users can provide a single pandas.DataFrame with all edges and vary plotting keyword arguments within metadata columns.

Users can provide edge-plotting keyword arguments via the edge_viz_kwargs parameter in two ways.

1. By providing a string value corresponding to a column name if a DataFrame is provided for edges, in which case that column data would be used for that plotting keyword argument in an edge_viz() call.

2. By providing explicit keyword arguments (e.g. cmap="viridis"), in which case that keyword argument would be used as-is in an edge_viz() call.

edge_kwargs can also be updated (or overwritten) after instantiation via the update_edge_viz_kwargs() method.

Parameters:
  • data – data to store as edges. Can provide either a single pandas.DataFrame / 2d numpy.ndarray, or a dictionary of dataframes / arrays, where each key corresponds to a unique identifier for that set of edges. If providing a numpy.ndarray, then it should be of shape (n, 2) where the first column corresponds to the starting node IDs and the second column corresponds to the ending node IDs.

  • from_column_name – name of the edge origin column, whose values correspond to node IDs where a given edge starts.

  • to_column_name – name of the edge destination column, whose values correspond to node IDs where a given edge ends.

  • edge_viz_kwargs – keyword arguments to provide to an edge_viz() call. Users can provide names according to column names in the data attribute or explicit values, as discussed in (1) and (2) above.

Note

If providing an array input for the data parameter, then it is required that the first column be the starting node IDs and the second column be the ending node IDs.

Array inputs will be stored in the data attribute as a pandas.DataFrame with column names "from" and "to".

Dictionary inputs for the data parameter can have any key, but the values must be either pandas.DataFrame or numpy.ndarray. If a numpy.ndarray is provided, it must be of shape (n, 2) where the first column corresponds to the starting node IDs and the second column corresponds to the ending node IDs. If a pandas.DataFrame is provided, then it must have columns named according to the from_column_name and to_column_name parameters.

Provided keyword argument values will be checked first against column names in Edges.data (i.e. (1) above) before falling back to (2) and setting the keyword argument explicitly.

The appropriate keyword argument names should be chosen as a function of your choice of visualization back end (e.g. matplotlib, bokeh, datashader, etc.).

add_edges(data: dict[Hashable, ndarray | DataFrame] | dict[Hashable, DataFrame]) None#

Add edges to the Edges instance.

Note

If adding edge data with a tag matching an existing tag, then edge data to add must have the same from and to columns as the existing data with the same tag.

2d arrays of data will always be accepted, but their edge data will be converted to pandas.DataFrame.

Parameters:

data – dictionary of data to add as edges. The key is a unique identifier to correspond to the added data value.

Raises:

AssertionError – if the provided data includes an invalid shaped numpy.ndarray or if the provided data for a tag has different columns than the existing data for that tag.

Returns:

None.

copy() Edges#

Return a copy of the Edges instance.

Returns:

copy of the Edges instance.

property data: DataFrame | dict[Hashable, DataFrame]#

Getter for the Edges.data attribute.

Returns:

when there is only a single tag of edges, returns the pandas.DataFrame of edges. When there are multiple tags of edges, returns a dictionary of pandas.DataFrame objects, where each key corresponds to the tag assigned for each set of edges.

export_edge_array(tag: Hashable | Literal['all'] = 'all') ndarray#

Return an (n, 2) array of [from, to] edges for the edge data corresponding to tag.

Parameters:

tag – tag of data to export. If all, then all tags of edge data are exported as a single array.

Raises:

AssertionError – if the provided tag is not a valid key in the Edges.data attribute.

Returns:

array of [from, to] edges.

property tags: list[Hashable]#

Return the list of all edge tags.

Returns:

list of tag keys for this Edges instance.

update_edge_viz_kwargs(tag: Hashable | None = None, reset_kwargs: bool = False, **edge_viz_kwargs) None#

Update keyword arguments for plotting edges in a edge_viz() call.

Users can either provide values in two ways.

1. By providing a string value corresponding to a column name, in which case that column data would be used for that plotting keyword argument in a edge_viz() call.

2. By providing explicit keyword arguments (e.g. cmap="viridis"), in which case that keyword argument would be used as-is in a edge_viz() call.

Note

Provided keyword argument values will be checked first against column names in Edges.data (i.e. (1) above) before falling back to (2) and setting the keyword argument explicitly.

The appropriate keyword argument names should be chosen as a function of your choice of visualization back end (e.g. matplotlib, bokeh, datashader, etc.).

These edge keyword arguments will be deprioritized in favor of any keyword arguments provided to any of the edge kwargs stored in the HivePlot.edge_plotting_keyword_arguments attribute.

Parameters:
  • tag – tag of edge data to update keyword arguments for. If None, then the keyword arguments are updated for all tags of edge data.

  • reset_kwargs – whether to drop the existing keyword arguments before adding the provided keyword arguments to the edge_viz_kwargs attribute. Existing values are preserved by default (i.e. reset_kwargs=False).

  • edge_viz_kwargs – keyword arguments to provide to a edge_viz() call. Users can provide names according to column names in the data attribute or explicit values, as discussed in (1) and (2) above.

Raises:

AssertionError – if the provided tag is not a valid key in the Edges.data attribute.

Returns:

None.

class hiveplotlib.HivePlot(nodes: NodeCollection, edges: Edges | ndarray, partition_variable: Hashable, sorting_variables: Hashable | Dict[Hashable, Hashable], backend: Literal['bokeh', 'datashader', 'holoviews-bokeh', 'holoviews-matplotlib', 'matplotlib', 'plotly'] = 'matplotlib', repeat_axes: bool | Hashable | List[Hashable] = False, axes_order: List[Hashable] | None = None, rotation: float = 0, angle_between_repeat_axes: float = 40, axis_kwargs: Dict[Hashable, Dict] | None = None, all_edge_kwargs: dict | None = None, clockwise_edge_kwargs: dict | None = None, counterclockwise_edge_kwargs: dict | None = None, repeat_edge_kwargs: dict | None = None, non_repeat_edge_kwargs: dict | None = None, warn_on_overlapping_kwargs: bool = True, num_steps_per_edge: int = 100, collapsed_group_axis_name: str = 'Other', use_numba: bool = True, n_parallel: int | None = None)#

Hive plot instantiation from nodes, edges, a provided partition variable, and sorting variable(s).

Axes will be created with names corresponding to the unique names in the data specified by partition_variable.

Nodes must be provided as a hiveplotlib.NodeCollection instance, and edges must be provided as an hiveplotlib.Edges instance.

Note

Any provided axis_kwargs will be applied after first initializing the hive plot axes according to the partition_variable, sorting_variables, repeat_axes, axes_order, rotation, and angle_between_repeat_axes parameter values.

By default, a repeat axis <axis_name>_repeat that has the same sorting variable will match the size, labeling, and node positioning of the original <axis_name> in the resulting hive plot unless the user explicitly changes this in initialization. To change this, users can provide <axis_name>_repeat keyword arguments to the axis_kwargs parameter on initialization or modify the repeat axis later with the hiveplotlib.HivePlot.update_axis() method.

If the repeat axis has a different sorting variable, then by default, it will infer the vmin and vmax values to place the nodes spanning the full extent of the resulting axis.

If a list of axes_order names are provided and one of the names in the provided list is None, then all remaining values unspecified in the provided list that are in the current partition as specified by partition_variable will be collapsed onto a single axis. This is particularly useful when the partition variable has more than 3 values. To change the name of the collapsed group in the final hive plot visualization, see the collapsed_group_axis_name parameter.

Parameters:
  • nodes – node data to turn into a hive plot.

  • edges – edge data corresponding to provided nodes to turn into a hive plot. If providing a numpy.ndarray of edge data, must be provided as (from, to) pairs. Note, providing an array input does not support the inclusion of edge metadata, whereas the Edges instance input does.

  • partition_variable – which node variable to use to partition the nodes into separate axes. Partitioning will be done by unique values.

  • sorting_variables – which node variable to use to sort / place the nodes on each axis. Providing a single value uses the same variable for each axis. Alternatively, providing a dictionary of keys as the unique values from partition_variable column data and values being the corresponding sorting variable to use for that axis. Note when providing a dictionary input, _all_ keys created by the provided partition_variable must be specified (otherwise a MissingSortingVariableError will be raised).

  • backend – which visualization backend to use when plotting with the plot() method.

  • repeat_axes – unique values from partition_variable column data for which to create adjacent repeat axes. Repeat axes can be turned on for all unique values by setting this parameter to True. Default False sets no repeat axes.

  • axes_order – order in which to place axes on the hive plot. Names must correspond to the unique values in node data specified by partition_variable. If a list of axes_order names are provided and one of the names in the provided list is None, then all remaining values unspecified in the provided list that are in the current partition as specified by partition_variable will be collapsed onto a single axis. This is particularly useful when the partition variable has more than 3 values. To change the name of the collapsed group in the final hive plot visualization, see the collapsed_group_axis_name parameter. Default None uses the order in the pandas groupby object stored in the resulting partition attribute.

  • rotation – angle (measured in degrees) to rotate every axis counterclockwise off of the default value. (By default, axes are evenly spaced in polar coordinates, with the first axis drawn at an angle of 0 degrees.)

  • angle_between_repeat_axes – angle (measured in degrees) to use between repeat axes.

  • axis_kwargs – nested dictionaries of specific kwargs to update axes. Keys should be unique values from partition_variable column data. Values should be dictionaries corresponding to the parameters in hiveplotlib.HivePlot.update_axis().

  • all_edge_kwargs – additional keyword arguments for plotting all edges. Default None makes no additional modifications when plotting edges.

  • clockwise_edge_kwargs – additional keyword arguments for plotting edges going clockwise. Default None makes no additional modifications when plotting edges.

  • counterclockwise_edge_kwargs – additional keyword arguments for plotting edges going counterclockwise. Default None makes no additional modifications when plotting edges.

  • repeat_edge_kwargs – additional keyword arguments for plotting edges between repeat axes. Default None makes no additional modifications when plotting edges.

  • non_repeat_edge_kwargs – additional keyword arguments for plotting edges between non-repeat axes. Default None makes no additional modifications when plotting edges.

  • warn_on_overlapping_kwargs – whether to warn if overlapping keyword arguments are detected among the "all_edge_kwargs", "repeat_edge_kwargs", "non_repeat_edge_kwargs", "clockwise_edge_kwargs", and "counterclockwise_edge_kwargs" parameters.

  • num_steps_per_edge – how many steps to use in drawing each edge curve. Higher numbers will show smoother edges but take longer to compute and use more memory.

  • collapsed_group_axis_name – name of the axis corresponding to the multiple partition groups collapsed onto a single axis. Only used when axes_order includes a None axis.

  • use_numba – whether to enable numba-accelerated Bézier curve sampling when constructing edges. Default True. When enabled and numba is available, selection is automatic: serial numba for tiny or single-curve workloads (based on an internal floor), parallel numba otherwise.

  • n_parallel – explicit maximum thread count to use during numba-parallel edge construction, limited by available CPU cores and the number of curves. If None, uses all available CPU cores capped by the number of curves.

Raises:
  • InvalidPartitionVariableError – if invalid partition_variable provided. This value must correspond to a column of the node data.

  • MissingSortingVariableError – if any of the axes resulting from the choice of partition_variable does not have a set sorting variable according to the sorting_variables parameter.

  • InvalidSortingVariableError – if the sorting variables chosen for one or more of the axes does not correspond to a column of the node data.

  • RepeatInPartitionAxisNameError – if one or more proposed axes set via the partition_variable ends in "_repeat", which is reserved for repeat axes.

  • InvalidAxisNameError – if provided axis_kwargs points to an axis not in the resulting HivePlot instance.

  • InvalidAxesOrderError – if a non-None axes_order includes any names that do not correspond to the partition set via the provided partition_variable.

  • InvalidAxesOrderError – if user provides None as one of the axes in axes_order but there are no remaining unspecified names from the current partition to collapse onto this axis.

build_axes(build_axes_from_scratch: bool = False, preserve_original_edge_kwargs: bool = False) None#

Build axes and place nodes corresponding to current partition.

Parameters:
  • build_axes_from_scratch – if True, then all old axes and edges will be deleted and new axes will be generated. This is useful for example when the partition variable is changed. Note, however, that this will drop any existing keyword arguments modifying the axes (e.g. manually changing angles, starting and ending axes positions, etc.).

  • preserve_original_edge_kwargs – whether to preserve the original edge keyword arguments stored under the hive_plot_edges attribute.

Returns:

None.

build_hive_plot(build_axes_from_scratch: bool = False, preserve_original_edge_kwargs: bool = False) None#

Run all necessary computations to rebuild the underlying hive plot.

Note

Calling this method will kill any changes made to edges via the hiveplotlib.HivePlot.update_edges() method (except for any plotting keyword arguments if preserve_original_edge_kwargs=True).

Parameters:
  • build_axes_from_scratch – if True, old axes will be deleted and new axes will be generated. This is useful for example when the partition variable is changed. Note, however, that this will drop any existing keyword arguments modifying the axes (e.g. manually changing angles, starting and ending axes positions, etc.).

  • preserve_original_edge_kwargs – whether to preserve the original edge keyword arguments stored under the hive_plot_edges attribute.

Returns:

None.

connect_adjacent_axes(rebuild_edges: bool = True, warn_on_overlapping_kwargs: bool | None = None, preserve_original_edge_kwargs: bool = False) None#

Connect all adjacent axes.

Note

This function call will reset all the existing edges, redrawing all the edges from scratch.

Calling this method will kill any changes made to edges via the hiveplotlib.HivePlot.update_edges() method (except for any plotting keyword arguments if preserve_original_edge_kwargs=True).

Parameters:
  • rebuild_edges – whether to only update edge kwargs or to also redraw the edges. Default True also rebuilds edges.

  • warn_on_overlapping_kwargs – whether to warn if overlapping keyword arguments are detected among the "all_edge_kwargs", "repeat_edge_kwargs", "clockwise_edge_kwargs", and "counterclockwise_edge_kwargs" parameters. Default None falls back to the value set by the warn_on_overlapping_kwargs attribute.

  • preserve_original_edge_kwargs – whether to preserve the original edge keyword arguments stored under the hive_plot_edges attribute.

property edge_kwarg_hierarchy: list[Literal['all_edge_kwargs', 'repeat_edge_kwargs', 'non_repeat_edge_kwargs', 'clockwise_edge_kwargs', 'counterclockwise_edge_kwargs']]#

Return the current edge_kwarg_hierarchy list, specified from least prioritized to most prioritized.

plot(**kwargs)#

Plot underlying hive plot.

Note

When the backend is set to datashader, any provided node plotting keyword arguments in nodes.node_viz_kwargs will be disregarded, as attributes like color and size are reserved for datashading the nodes. Inclusion of any node_kwargs here will also raise a warning.

When the backend is set to datashader, any provided edge plotting keyword arguments in edges.edge_viz_kwargs will be disregarded, as attributes like color and size are reserved for datashading the edges. Inclusion of any edge kwargs here as part of the additional im_kwargs (discussed further in the docstring for datashade_hive_plot_mpl()) will likely trigger an error.

Parameters:

kwargs – keyword arguments for the appropriate hive_plot_viz() call, depending on which viz backend is currently set. See the Visualization module documentation for more information on possible arguments. Other than different backends having different names for equivalent keyword arguments, these should for the most part be interchangeable, with the exception of the datashader backend (see note above).

Returns:

viz data structures, see the appropriate hive_plot_viz() call corresponding to the current viz backend for more information here.

rename_edge_kwargs(**rename_kwargs) None#

Rename specific edge kwarg names.

This will operate on all of the possible edge kwarg settings stored in the edge_plotting_keyword_arguments attribute and edge kwargs stored in the hive_plot_edges This allows users to quickly accommodate different visualization back ends that may require different keyword argument names.

Note

Not all edge keyword arguments are supported by all back ends. For example, some back ends may not support the zorder concept in matplotlib to reorder the plotting of edges independently of the order in which they were plotted. In this case, users can *remove these keyword arguments entirely by providing {old_name: None} in the rename_kwargs parameter.

Parameters:

rename_kwargs – dictionary that will map old keyword argument names to new keyword argument names. This will operate on all of the possible edge kwarg settings stored in the edge_plotting_keyword_arguments attribute and edge kwargs stored in the hive_plot_edges This allows users to quickly accommodate different visualization back ends that may require different keyword argument names. To remove an incompatible edge kwarg, provide {old_name: None} in the dictionary.

set_angle_between_repeat_axes(angle: float = 40, build_hive_plot: bool = True, preserve_original_edge_kwargs: bool = True) None#

Set the angle (in degrees) between repeat axes.

Parameters:
  • angle – angle (in degrees) to use between repeat axes.

  • build_hive_plot – whether to rebuild the hive plot (i.e. redraw edges). This computation is usually desired, but users can save extra computation if running multiple setter methods by saving rebuilding for the last setter call.

  • preserve_original_edge_kwargs – whether to preserve the original edge keyword arguments stored under the hive_plot_edges attribute.

Returns:

None.

set_axes_order(axes: List[Hashable | None] | ndarray | None = None, collapsed_group_axis_name: str | None = None, collapsed_group_sorting_variable: Hashable | None = None, build_hive_plot: bool = True, preserve_original_edge_kwargs: bool = True, check_collapsed_group_sorting_variable: bool = True, require_using_all_partition_names: bool = True) None#

Set order of axes to be plotted in counterclockwise order.

Names must correspond to the unique values in node data specified by the partition_variable attribute, or users can provide None as one of the axes to collapse any unspecified groups from the partition onto a single axis.

Default None uses the order in the pandas groupby object stored in the partition attribute.

Note

If a user is trying to set a subset of the partition names under the axes parameter (without a None collapsing axis), then the user should instead call set_partition() with the desired axes_order subset of names.

Parameters:
  • axes – unique names available in the column of data corresponding to the partition_variable attribute. Names must correspond to the unique values in node data specified by the current partition_variable. If a list of axes names are provided and one of the names in the provided list is None, then all remaining values unspecified in the provided list that are in the current partition as specified by partition_variable will be collapsed onto a single axis. This is particularly useful when the partition variable has more than 3 values. To change the name of the collapsed group in the final hive plot visualization, see the collapsed_group_axis_name parameter. Default None uses the order in the pandas groupby object stored in the resulting partition attribute.

  • collapsed_group_axis_name – name of the axis corresponding to the multiple partition groups collapsed onto a single axis. Only used when axes_order includes a None axis. Default None uses the name stored under the collapsed_group_axis_name attribute.

  • collapsed_group_sorting_variable – sorting variable to use for the collapsed group axis. If not provided, and a value is not available in the sorting_variables attribute, then a MissingSortingVariableError will be raised.

  • build_hive_plot – whether to rebuild the hive plot (i.e. redraw edges). This computation is usually desired, but users can save extra computation if running multiple setter methods by saving rebuilding for the last setter call.

  • preserve_original_edge_kwargs – whether to preserve the original edge keyword arguments stored under the hive_plot_edges attribute.

  • check_collapsed_group_sorting_variable – whether to check if a collapsed group sorting variable exists.

  • require_using_all_partition_names – whether to require that the user provides all partition names in the axes parameter. If True, then the user must provide all partition names, or at least provide None to collapse any unspecified groups onto a single axis. If trying to set a subset of the partition names, then the user should instead call set_partition() with the desired axes_order.

Returns:

None.

Raises:
  • InvalidAxesOrderError – if non-None axes parameter provides names outside of the current partition.

  • InvalidAxesOrderError – if user provides a strict subset of partition axes values and require_using_all_partition_names=True.

  • InvalidAxesOrderError – if user provides None as one of the axes but there are no remaining unspecified names from the current partition to collapse onto this axis.

  • MissingSortingVariableError – if the sorting variable for the collapsed group axis is not provided and a value is not available for the collapsed group axis under the sorting_variables attribute. Note, check only runs if check_collapsed_group_sorting_variable is True.

set_partition(partition_variable: Hashable, sorting_variables: Hashable | Dict[Hashable, Hashable], repeat_axes: bool | Hashable | List[Hashable] = False, axes_order: List[Hashable] | ndarray | None = None, collapsed_group_axis_name: str = 'Other', build_hive_plot: bool = True) None#

Set the node partition variable, create the necessary axes, and place nodes on the axes accordingly.

Note

This call will remove any existing axes.

Parameters:
  • partition_variable – node partition variable to use.

  • sorting_variables – sorting variable(s) to use for axes. Can specify a single value to use for all axes or a dictionary with axis name keys and sorting variable values to assign specific sorting variables to individual axes. Repeat axes can be specified by specifying the resulting axis name, which will be "<partition_value>_repeat" for whatever <partition_value> to which an axis corresponds.

  • repeat_axes – axes names for which to create repeat axes. Providing True here turns on all possible axes specified via the partition attribute. False or [] turns off all repeat axes.

  • axes_order – unique names available in the column of data corresponding to the partition_variable attribute. Names must correspond to the unique values in node data specified by the current partition_variable. If a list of axes_order names are provided and one of the names in the provided list is None, then all remaining values unspecified in the provided list that are in the current partition as specified by partition_variable will be collapsed onto a single axis. This is particularly useful when the partition variable has more than 3 values. To change the name of the collapsed group in the final hive plot visualization, see the collapsed_group_axis_name parameter. Default None uses the order in the pandas groupby object stored in the resulting partition attribute.

  • collapsed_group_axis_name – name of the axis corresponding to the multiple partition groups collapsed onto a single axis. Only used when axes_order includes a None axis.

  • build_hive_plot – whether to rebuild the hive plot (i.e. recompute axes and redraw edges). This computation is usually desired, but users can save extra computation if running multiple setter methods by saving rebuilding for the last setter call.

Returns:

None.

Raises:
  • InvalidPartitionVariableError – if invalid partition_variable provided.

  • RepeatInPartitionAxisNameError – if one or more implied axes names from the given partition would end in "_repeat". This naming convention is reserved for repeat axes.

  • InvalidAxesOrderError – if non-None axes_order parameter provides names outside of the current partition.

  • InvalidAxesOrderError – if user provides None as one of the axes but there are no remaining unspecified names from the current partition to collapse onto this axis.

set_repeat_axes(axes_names: bool | Hashable | List[Hashable], sorting_variables: Hashable | Dict[Hashable, Hashable] | None = None, build_hive_plot: bool = True, preserve_original_edge_kwargs: bool = True) None#

Set repeat axes for specified axes names.

Note

This method will overwrite existing repeat axes specifications. Thus, rerunning this method will remove any repeat axes not specified in the call. See the example code below.

If a necessary repeat axis sorting variable is not provided by the user, then this method will use the sorting variable from the corresponding non-repeat axis.

Any existing repeat axes can be removed by setting axes_names to False or [].

from hiveplotlib.datasets import example_hive_plot

hp = example_hive_plot()
list(hp.axes.keys())
>>> ['A', 'B', 'C']

# adds 'A_repeat'
hp.set_repeat_axes("A")
list(hp.axes.keys())
>>> ['A', 'B', 'C', 'A_repeat']

# removes 'A_repeat', adds 'B_repeat'
hp.set_repeat_axes("B")
list(hp.axes.keys())
>>> ['B', 'C', 'A', 'B_repeat']
Parameters:
  • axes_names – axes names for which to create repeat axes. Providing True here turns on all possible axes specified via the partition attribute. False or [] turns off all repeat axes.

  • sorting_variables – sorting variables to choose for the axis / axes.

  • build_hive_plot – whether to rebuild the hive plot (i.e. redraw edges). This computation is usually desired, but users can save extra computation if running multiple setter methods by saving rebuilding for the last setter call.

  • preserve_original_edge_kwargs – whether to preserve the original edge keyword arguments stored under the hive_plot_edges attribute.

Raises:
Returns:

None.

set_rotation(rotation: float, build_hive_plot: bool = True, preserve_original_edge_kwargs: bool = True) None#

Rotate all axes counterclockwise relative to the default placement, then reconstruct axes and edges accordingly.

By default, axes are equally spaced in polar coordinates, with the first axis placed at an angle of 0 degrees.

Changing the rotation angle will rotate every axis counterclockwise by the provided rotation value (measured in degrees).

Parameters:
  • rotation – angle (measured in degrees) to rotate every axis counterclockwise off of the default value. (By default, axes are evenly spaced in polar coordinates, with the first axis drawn at an angle of 0 degrees.)

  • build_hive_plot – whether to rebuild the hive plot (i.e. redraw edges). This computation is usually desired, but users can save extra computation if running multiple setter methods by saving rebuilding for the last setter call.

  • preserve_original_edge_kwargs – whether to preserve the original edge keyword arguments stored under the hive_plot_edges attribute.

Returns:

None.

set_viz_backend(backend: Literal['bokeh', 'datashader', 'holoviews-bokeh', 'holoviews-matplotlib', 'matplotlib', 'plotly']) None#

Set viz backend for plotting.

Parameters:

backend – which viz backend to use for plotting.

Raises:

AssertionError – if user tries to set an unsupported viz backend.

Returns:

None.

to_json() str#

Return the plotting information from the axes, nodes, and edges in Cartesian space as a serialized JSON string.

This allows users to visualize hive plots with arbitrary libraries, even outside of python.

The dictionary structure of the resulting JSON will consist of three top-level keys:

“axes” - contains the information for plotting each axis (including angle and long_name), plus the nodes on each axis in Cartesian space.

“edges” - contains the information for plotting the discretized edges in Cartesian space, plus the corresponding to and from IDs that go with each edge, as well as any kwargs that were set for plotting each set of edges.

“node_viz_kwargs” - contains the resolved node visualization keyword arguments for each axis. Column references in hiveplotlib.NodeCollection.node_viz_kwargs are resolved to per-node arrays.

Note

The resulting JSON will not contain the additional data for the nodes or edges stored under the nodes and edges attributes, respectively. It will only the Cartesian coordinates of the nodes and the discretized curves of the edges.

Returns:

JSON output of axis, node, and edge information.

update_axis(axis_id: Hashable, sorting_variable: Hashable | None = None, vmin: float | None | Literal['unchanged'] = 'unchanged', vmax: float | None | Literal['unchanged'] = 'unchanged', start: float | None = None, end: float | None = None, angle: float | None = None, long_name: Hashable | None = None, preserve_original_edge_kwargs: bool = True, build_hive_plot: bool = True) None#

Update existing axis parameters.

Allows updating axis size, axis placement in cartesian space, the long name for axis labeling during plotting, node sorting, and positioning nodes on the axis.

When running on a given axis, any unspecified parameters will remain unchanged from the axis’ original values.

Note

If a sorting_variable parameter is provided, and the axis was previously inferring the vmin / vmax, then the default behavior of the vmin / vmax parameter, if not provided, will be to re-determine the global minimum / maximum for the new feature values (i.e. as if the parameter were set to None).

Parameters:
  • axis_id – unique name for Axis instance.

  • sorting_variable – node sorting variable to use. Default None maintains existing sorting variable. If the vmin and / or vmax value was previously inferred, then it will be re-inferred according to the global min and / or max values of this new sorting variable.

  • vmin – all values less than vmin will be set to vmin. None infers and sets as global minimum of feature values for all Node instances on specified Axis. If the vmin value was explicitly set beforehand by the user or the sorting_variable was left unchanged, then the default value "unchanged" will use the same vmin value as before. However, if the sorting_variable parameter was changed and the vmin value was previously inferred, then by default, the global minimum will be re-determined for the revised sorting_variable values, as done when set to None.

  • vmax – all values greater than vmax will be set to vmax. None sets as global maximum of feature values for all Node instances on specified Axis. If the vmax value was explicitly set beforehand by the user or the sorting_variable was left unchanged, then the default value "unchanged" will use the same vmax value as before. However, if the sorting_variable parameter was changed and the vmax value was previously inferred, then by default, the global maximum will be re-determined for the revised sorting_variable values, as done when set to None.

  • start – point closest to the center of the plot (using the same positive number for multiple axes in a hive plot is a nice way to space out the figure). Default None maintains existing start position.

  • end – point farthest from the center of the plot. Default None maintains existing ending position.

  • angle – angle to set the axis, in degrees (moving counterclockwise, e.g. 0 degrees points East, 90 degrees points North). Default None maintains existing angle.

  • long_name – longer name for use when labeling on graph (but not for referencing the axis). Default None sets it to axis_id. Default None maintains existing long name.

  • preserve_original_edge_kwargs – whether to preserve the original edge keyword arguments stored under the hive_plot_edges attribute.

  • build_hive_plot – whether to rebuild the hive plot (i.e. redraw edges). This computation is usually desired, but users can save extra computation if running multiple setter methods by saving rebuilding for the last setter call.

Raises:

AssertionError – if provided axis_id not an existing axis under the axes attribute.

Returns:

None.

update_edge_plotting_keyword_arguments(edge_kwarg_setting: Literal['all_edge_kwargs', 'repeat_edge_kwargs', 'non_repeat_edge_kwargs', 'clockwise_edge_kwargs', 'counterclockwise_edge_kwargs'] = 'all_edge_kwargs', reset_edge_kwarg_setting: bool = False, rebuild_edges: bool = False, **kwargs) dict#

Update the edge keyword arguments for a specific edge_kwarg_setting.

Parameters:
  • edge_kwarg_setting – which edge kwarg setting to modify.

  • reset_edge_kwarg_setting – whether to overwrite existing keyword arguments for the chosen edge_kwarg_setting.

  • rebuild_edges – whether to only update edge kwargs or to also redraw the edges. Default False only updates edge kwargs.

  • kwargs – additional keyword arguments to provide to the specified edge kwarg setting.

Returns:

dictionary of the resulting keyword arguments for that edge kwarg setting.

update_edges(partition_id_1: Hashable, partition_id_2: Hashable, tag: Hashable | None = None, p1_to_p2: bool = True, p2_to_p1: bool = True, short_arc: bool | None = None, control_rho_scale: float | None = None, control_angle_shift: float | None = None, reset_existing_kwargs: bool = False, overwrite_existing_kwargs: bool = True, **edge_kwargs) None#

Modify all existing edges between a pair of partition groups.

This method allows changing edge construction parameters and / or edge visualization keyword arguments.

Note

This method also allows for modification of edges in just one direction between the two provided partition groups by specifying p1_to_p2 or p2_to_p1 as False (both are True by default).

Any updates done via this method will be lost if one calls the hiveplotlib.HivePlot.build_hive_plot() method.

Parameters:
  • partition_id_1 – Hashable pointer to the first group in the current partition between which we want to modify connections.

  • axis_id_2 – Hashable pointer to the second group in the current partition between which we want to modify connections.

  • tag – unique ID specifying which tag of edges to modify. Note, if no tag is specified (e.g. tag=None), it is presumed there is only one tag for the specified set of partition IDs to look over, which can be inferred. If no tag is specified and there are multiple tags to choose from, an UnspecifiedTagError will be raised.

  • p1_to_p2 – whether to modify connections going FROM partition_id_1 TO partition_id_2.

  • p2_to_p1 – whether to modify connections going FROM partition_id_2 TO partition_id_1.

  • short_arc – whether to take the shorter angle arc (True) or longer angle arc (False). When not set, uses a default value True. There are always two ways to traverse between axes: with one angle being x, the other option being 360 - x. For most visualizations, the user should expect to traverse the “short arc,” hence the default True. For full user flexibility, however, we offer the ability to force the arc the other direction, the “long arc” (short_arc=False). Note: in the case of 2 axes 180 degrees apart, there is no “wrong” angle, so in this case an initial decision will be made, but switching this boolean will switch the arc to the other hemisphere.

  • control_rho_scale – how much to multiply the distance of the control point for each edge to / from the origin. When not set, uses a default value 1, which sets the control rho for each edge as the mean rho value for each pair of nodes being connected by that edge. A value greater than 1 will pull the resulting edges further away from the origin, making edges more convex, while a value between 0 and 1 will pull the resulting edges closer to the origin, making edges more concave. Note, this affects edges further from the origin by larger magnitudes than edges closer to the origin.

  • control_angle_shift – how far to rotate the control point for each edge around the origin. When not set, uses a default value 0, which sets the control angle for each edge as the mean polar angle for each pair of nodes being connected by that edge. A positive value will pull the resulting edges further counterclockwise, while a negative value will pull the resulting edges further clockwise.

  • reset_existing_kwargs – whether to delete existing edge kwargs stored in the hive_plot_edges attribute for the specified edges, default leaves the existing edge kwargs unchanged, overwriting any provided kwargs accordingly.

  • overwrite_existing_kwargs – whether to overwrite existing edge kwargs stored in the hive_plot_edges attribute for the specified edges when also provided in edge_kwargs, default True.

  • edge_kwargs – additional params that will be applied to the related edges.

Returns:

None.

Raises:
update_node_viz_kwargs(reset_kwargs: bool = False, **node_viz_kwargs) None#

Update keyword arguments for plotting nodes in a node_viz() call.

Users can either provide values in two ways.

1. By providing a string value corresponding to a column name, in which case that column data would be used for that plotting keyword argument in a node_viz() call.

2. By providing explicit keyword arguments (e.g. cmap="viridis"), in which case that keyword argument would be used as-is in a node_viz() call.

Note

Provided keyword argument values will be checked first against column names in nodes.data (i.e. (1) above) before falling back to (2) and setting the keyword argument explicitly.

The appropriate keyword argument names should be chosen as a function of your choice of visualization back end (e.g. matplotlib, bokeh, datashader, etc.).

This is a wrapper method for calling hiveplotlib.NodeCollection.update_node_viz_kwargs() on the underlying nodes attribute.

Parameters:
  • reset_kwargs – whether to drop the existing keyword arguments before adding the provided keyword arguments to the node_viz_kwargs attribute. Existing values are preserved by default (i.e. reset_kwargs=False).

  • node_viz_kwargs – keyword arguments to provide to a node_viz() call. Users can provide names according to column names in the data attribute or explicit values, as discussed in (1) and (2) above.

Returns:

None.

update_partition_data() None#

Update the partition data based on the current node data.

This method is useful when the node data has changed, which means the partition needs to be recalculated and the resulting new data propagating to the axes.

this method will reset the partition and update the axes accordingly.

Returns:

None.

update_sorting_variables(sorting_variables: Hashable | Dict[Hashable, Hashable], reset_vmin_and_vmax: bool = True, build_hive_plot: bool = True, preserve_original_edge_kwargs: bool = True) None#

Update sorting variables for specified axes with the current partition.

Parameters:
  • sorting_variables – sorting variable(s) to use for axes. Can specify a single value to use for all axes or a dictionary with axis name keys and sorting variable values to assign specific sorting variables to individual axes. Unless overwriting all current sorting variables, previously set sorting variables will be preserved.

  • reset_vmin_and_vmax – if True, then setting a sorting variable for an axis / axes will throw out any existing vmin / vmax information, reinitializing to infer and span the full extent of data (i.e. vmin=None and vmax=None).

  • build_hive_plot – whether to rebuild the hive plot (i.e. redraw edges). This computation is usually desired, but users can save extra computation if running multiple setter methods by saving rebuilding for the last setter call.

  • preserve_original_edge_kwargs – whether to preserve the original edge keyword arguments stored under the hive_plot_edges attribute.

Raises:
  • MissingSortingVariableError – if not all of the current partition axes have been specified with a sorting variable (either from the current call or from earlier, either with another call to this method or by setting sorting_variables on initialization of the HivePlot instance.

  • InvalidSortingVariableError – if the sorting variables chosen for one or more of the axes does not correspond to a column of the node data.

Returns:

None.

Note

If specifying a dictionary of sorting_variables information, any axes keys excluded from the provided dictionary will be unaffected, each keeping its existing sorting variable.

Repeat axes can be specified by specifying the repeat axis name, which will be "<partition_value>_repeat" for whatever <partition_value> to which an axis corresponds.

Providing an invalid sorting variable value will raise a InvalidSortingVariableError.

A Hashable input will set the sorting variable of all possible axes with the current partition attribute, including all possible repeat axes (whether plotted or not), to use the provided sorting variable. Any sorting variables set for a previous partition axis will be preserved.

If reset_vmin_and_vmax=True, then setting a sorting variable for an axis will throw out any existing vmin / vmax information for the provided axis / axes, reinitializing to infer and span the full extent of data (i.e. vmin=None and vmax=None).

Providing a nonexistent axis key will not raise any error. Instead, the sorting variable for the nonexistent axis will be stored in the sorting_variables attribute dictionary, leaving current axes unaffected. This allows users to set sorting variables for multiple partitions at once without setting the sorting variables everytime the partition variable is changed.

hiveplotlib.hiveplot.supported_viz_backends()#

Return the supported visualization back ends for hiveplotlib hive plots.