High-Level Hive Plot API#
- class hiveplotlib.NodeCollection(data: DataFrame, unique_id_column: Hashable | None = None, node_viz_kwargs: dict | None = None, check_uniqueness: bool = True)#
Multi-node data aggregator and partitioner for downstream hive plots.
Ingests an input
pandas.DataFrame(data) with specification for which data column correponds to the nodes’ unique IDs (unique_id_column).Users can provide node-plotting keyword arguments via the
node_viz_kwargsparameter in two ways.1. By providing a string value corresponding to a column name, in which case that column data would be used for that plotting keyword argument in a
node_viz()call.2. By providing explicit keyword arguments (e.g.
cmap="viridis"), in which case that keyword argument would be used as-is in anode_viz()call.node_kwargscan also be updated (or overwritten) after instantiation via theupdate_node_viz_kwargs()method.Note
Provided keyword argument values will be checked first against column names in
NodeCollection.data(i.e. (1) above) before falling back to (2) and setting the keyword argument explicitly.The appropriate keyword argument names should be chosen as a function of your choice of visualization back end (e.g.
matplotlib,bokeh,datashader, etc.).- Parameters:
data – dataframe of node data.
unique_id_column – which column of
datato use for each node’s unique ID. DefaultNonecreates one using the dataframe’s index values.node_viz_kwargs – keyword arguments to provide to a
node_viz()call. Users can provide names according to column names in thedataattribute or explicit values, as discussed in (1) and (2) above.check_uniqueness – whether to check the
unique_id_columnofdatafor uniqueness. This is always good to check, but users may wish to skip if working with large datasets that have already checked this column for uniqueness, for example, if using data from a SQL database with the primary key column.
- Raises:
RepeatUniqueNodeIDsError – if
datacontains non-unique node IDs in theunique_id_column(andcheck_uniqueness=True).
- check_unique_ids() bool#
Check that the
unique_id_columnof thedataattribute contains unique values.- Returns:
Trueif all values in theunique_id_columnare unique,Falseotherwise.
- copy() NodeCollection#
Create a deep copy of the
NodeCollectioninstance.- Returns:
deep copy of the
NodeCollectioninstance.
- create_partition_variable(data_column: Hashable, cutoffs: list[float] | ndarray | float | int = 3, labels: List[Hashable] | None = None, partition_variable_name: Hashable | None = None) Hashable#
Create a column in the
dataattribute partitioning the data with respect to a single column variable.By default, splits will partition nodes by unique values of
data_column.If
data_columncorresponds to numerical data, and alistofcutoffsis provided, node IDs will be separated into bins according to the following binning scheme:(-inf,
cutoff[0]], (cutoff[0],cutoff[1]], … , (cutoff[-1], inf]If
data_columncorresponds to numerical data, andcutoffsis provided as anint, node IDs will be separated intocutoffsequal-sized quantiles.Note
This method currently only supports splits where
data_columncorresponds to numerical data.- Parameters:
data_column – which column of data in the underlying
dataattribute to use to partition the data.cutoffs – cutoffs to use in binning nodes according to data under
data_column. Default3will bin nodes into 3 equally-sized bins based on the unique values ofdata_column. When provided as anint, the exact numerical break points will be determined to createcutoffsequally-sized quantiles. When provided as alist/ array of values, the specified cutoffs will bin according to (-inf,cutoffs[0]], (cutoffs[0],cutoffs[1]], … , (cutoffs[-1], inf).labels – labels assigned to each bin. Only referenced when
cutoffsis notNone. DefaultNonelabels each bin as a string based on its range of values. Note, whencutoffsis a list,len(labels)must be 1 greater thanlen(cutoffs). Whencutoffsis anint,len(labels)must be equal tocutoffs.partition_variable_name – name of the resulting partition variable to add to the
dataattribute. DefaultNonecreates names starting at"partition_0", incrementing the integer to keep names unique if the user creates multiple partitions.
- Returns:
column name of partition information added to the
dataattribute.- Raises:
InvalidPartitionVariableError – if invalid
data_columnprovided.InvalidPartitionVariableError – if
partition_variable_nameends in_collapsed_axis. This is a protected name for internal use.
- update_node_viz_kwargs(reset_kwargs: bool = False, **node_viz_kwargs) None#
Update keyword arguments for plotting nodes in a
node_viz()call.Users can either provide values in two ways.
1. By providing a string value corresponding to a column name, in which case that column data would be used for that plotting keyword argument in a
node_viz()call.2. By providing explicit keyword arguments (e.g.
cmap="viridis"), in which case that keyword argument would be used as-is in anode_viz()call.Note
Provided keyword argument values will be checked first against column names in
NodeCollection.data(i.e. (1) above) before falling back to (2) and setting the keyword argument explicitly.The appropriate keyword argument names should be chosen as a function of your choice of visualization back end (e.g.
matplotlib,bokeh,datashader, etc.).- Parameters:
reset_kwargs – whether to drop the existing keyword arguments before adding the provided keyword arguments to the
node_viz_kwargsattribute. Existing values are preserved by default (i.e.reset_kwargs=False).node_viz_kwargs – keyword arguments to provide to a
node_viz()call. Users can provide names according to column names in thedataattribute or explicit values, as discussed in (1) and (2) above.
- Returns:
None.
- class hiveplotlib.Edges(data: DataFrame | ndarray | dict[Hashable, ndarray | DataFrame], from_column_name: Hashable = 'from', to_column_name: Hashable = 'to', edge_viz_kwargs: dict | None = None)#
Multi-edge aggregator with helper methods useful for downstream hive plots.
An edge is specificed with respect to its starting node unique ID and ending node unique ID.
The
Edgeclass ingests an inputpandas.DataFrameor(n, 2)numpy.ndarray(data) with specification for which data columns correspond to the starting node IDs (from_column_name) and ending node IDs (to_column_name).Users can also provide a dictionary of dataframes or arrays, where each key corresponds to a unique identifier for that set of edges. This allows users to store multiple sets of edges in a single
Edgesinstance.By providing a
pandas.DataFrameinput, additional edge metadata can be provided for later use (e.g. subsetting edges by metadata, keyword arguments for plotting edges with different thickness / color, etc.).Users can thus visualize groups of edges in different ways in a single hive plot by providing a dictionary of dataframes with different edge metadata. Alternatively, users can provide a single
pandas.DataFramewith all edges and vary plotting keyword arguments within metadata columns.Users can provide edge-plotting keyword arguments via the
edge_viz_kwargsparameter in two ways.1. By providing a string value corresponding to a column name if a DataFrame is provided for edges, in which case that column data would be used for that plotting keyword argument in an
edge_viz()call.2. By providing explicit keyword arguments (e.g.
cmap="viridis"), in which case that keyword argument would be used as-is in anedge_viz()call.edge_kwargscan also be updated (or overwritten) after instantiation via theupdate_edge_viz_kwargs()method.- Parameters:
data – data to store as edges. Can provide either a single
pandas.DataFrame/ 2dnumpy.ndarray, or a dictionary of dataframes / arrays, where each key corresponds to a unique identifier for that set of edges. If providing anumpy.ndarray, then it should be of shape(n, 2)where the first column corresponds to the starting node IDs and the second column corresponds to the ending node IDs.from_column_name – name of the edge origin column, whose values correspond to node IDs where a given edge starts.
to_column_name – name of the edge destination column, whose values correspond to node IDs where a given edge ends.
edge_viz_kwargs – keyword arguments to provide to an
edge_viz()call. Users can provide names according to column names in thedataattribute or explicit values, as discussed in (1) and (2) above.
Note
If providing an array input for the
dataparameter, then it is required that the first column be the starting node IDs and the second column be the ending node IDs.Array inputs will be stored in the
dataattribute as apandas.DataFramewith column names"from"and"to".Dictionary inputs for the
dataparameter can have any key, but the values must be eitherpandas.DataFrameornumpy.ndarray. If anumpy.ndarrayis provided, it must be of shape(n, 2)where the first column corresponds to the starting node IDs and the second column corresponds to the ending node IDs. If apandas.DataFrameis provided, then it must have columns named according to thefrom_column_nameandto_column_nameparameters.Provided keyword argument values will be checked first against column names in
Edges.data(i.e. (1) above) before falling back to (2) and setting the keyword argument explicitly.The appropriate keyword argument names should be chosen as a function of your choice of visualization back end (e.g.
matplotlib,bokeh,datashader, etc.).- add_edges(data: dict[Hashable, ndarray | DataFrame] | dict[Hashable, DataFrame]) None#
Add edges to the
Edgesinstance.Note
If adding edge data with a tag matching an existing tag, then edge data to add must have the same from and to columns as the existing data with the same tag.
2d arrays of data will always be accepted, but their edge data will be converted to
pandas.DataFrame.- Parameters:
data – dictionary of data to add as edges. The key is a unique identifier to correspond to the added data value.
- Raises:
AssertionError – if the provided
dataincludes an invalid shapednumpy.ndarrayor if the provideddatafor a tag has different columns than the existing data for that tag.- Returns:
None.
- property data: DataFrame | dict[Hashable, DataFrame]#
Getter for the
Edges.dataattribute.- Returns:
when there is only a single tag of edges, returns the
pandas.DataFrameof edges. When there are multiple tags of edges, returns a dictionary ofpandas.DataFrameobjects, where each key corresponds to the tag assigned for each set of edges.
- export_edge_array(tag: Hashable | Literal['all'] = 'all') ndarray#
Return an
(n, 2)array of [from, to] edges for the edge data corresponding totag.- Parameters:
tag – tag of data to export. If
all, then all tags of edge data are exported as a single array.- Raises:
AssertionError – if the provided
tagis not a valid key in theEdges.dataattribute.- Returns:
array of [from, to] edges.
- property tags: list[Hashable]#
Return the list of all edge tags.
- Returns:
list of tag keys for this
Edgesinstance.
- update_edge_viz_kwargs(tag: Hashable | None = None, reset_kwargs: bool = False, **edge_viz_kwargs) None#
Update keyword arguments for plotting edges in a
edge_viz()call.Users can either provide values in two ways.
1. By providing a string value corresponding to a column name, in which case that column data would be used for that plotting keyword argument in a
edge_viz()call.2. By providing explicit keyword arguments (e.g.
cmap="viridis"), in which case that keyword argument would be used as-is in aedge_viz()call.Note
Provided keyword argument values will be checked first against column names in
Edges.data(i.e. (1) above) before falling back to (2) and setting the keyword argument explicitly.The appropriate keyword argument names should be chosen as a function of your choice of visualization back end (e.g.
matplotlib,bokeh,datashader, etc.).These edge keyword arguments will be deprioritized in favor of any keyword arguments provided to any of the edge kwargs stored in the
HivePlot.edge_plotting_keyword_argumentsattribute.- Parameters:
tag – tag of edge data to update keyword arguments for. If
None, then the keyword arguments are updated for all tags of edge data.reset_kwargs – whether to drop the existing keyword arguments before adding the provided keyword arguments to the
edge_viz_kwargsattribute. Existing values are preserved by default (i.e.reset_kwargs=False).edge_viz_kwargs – keyword arguments to provide to a
edge_viz()call. Users can provide names according to column names in thedataattribute or explicit values, as discussed in (1) and (2) above.
- Raises:
AssertionError – if the provided
tagis not a valid key in theEdges.dataattribute.- Returns:
None.
- class hiveplotlib.HivePlot(nodes: NodeCollection, edges: Edges | ndarray, partition_variable: Hashable, sorting_variables: Hashable | Dict[Hashable, Hashable], backend: Literal['bokeh', 'datashader', 'holoviews-bokeh', 'holoviews-matplotlib', 'matplotlib', 'plotly'] = 'matplotlib', repeat_axes: bool | Hashable | List[Hashable] = False, axes_order: List[Hashable] | None = None, rotation: float = 0, angle_between_repeat_axes: float = 40, axis_kwargs: Dict[Hashable, Dict] | None = None, all_edge_kwargs: dict | None = None, clockwise_edge_kwargs: dict | None = None, counterclockwise_edge_kwargs: dict | None = None, repeat_edge_kwargs: dict | None = None, non_repeat_edge_kwargs: dict | None = None, warn_on_overlapping_kwargs: bool = True, num_steps_per_edge: int = 100, collapsed_group_axis_name: str = 'Other', use_numba: bool = True, n_parallel: int | None = None)#
Hive plot instantiation from nodes, edges, a provided partition variable, and sorting variable(s).
Axes will be created with names corresponding to the unique names in the data specified by
partition_variable.Nodes must be provided as a
hiveplotlib.NodeCollectioninstance, and edges must be provided as anhiveplotlib.Edgesinstance.Note
Any provided
axis_kwargswill be applied after first initializing the hive plot axes according to thepartition_variable,sorting_variables,repeat_axes,axes_order,rotation, andangle_between_repeat_axesparameter values.By default, a repeat axis
<axis_name>_repeatthat has the same sorting variable will match the size, labeling, and node positioning of the original<axis_name>in the resulting hive plot unless the user explicitly changes this in initialization. To change this, users can provide<axis_name>_repeatkeyword arguments to theaxis_kwargsparameter on initialization or modify the repeat axis later with thehiveplotlib.HivePlot.update_axis()method.If the repeat axis has a different sorting variable, then by default, it will infer the
vminandvmaxvalues to place the nodes spanning the full extent of the resulting axis.If a list of
axes_ordernames are provided and one of the names in the provided list isNone, then all remaining values unspecified in the provided list that are in the current partition as specified bypartition_variablewill be collapsed onto a single axis. This is particularly useful when the partition variable has more than 3 values. To change the name of the collapsed group in the final hive plot visualization, see thecollapsed_group_axis_nameparameter.- Parameters:
nodes – node data to turn into a hive plot.
edges – edge data corresponding to provided
nodesto turn into a hive plot. If providing anumpy.ndarrayof edge data, must be provided as (from, to) pairs. Note, providing an array input does not support the inclusion of edge metadata, whereas theEdgesinstance input does.partition_variable – which node variable to use to partition the nodes into separate axes. Partitioning will be done by unique values.
sorting_variables – which node variable to use to sort / place the nodes on each axis. Providing a single value uses the same variable for each axis. Alternatively, providing a dictionary of keys as the unique values from
partition_variablecolumn data and values being the corresponding sorting variable to use for that axis. Note when providing a dictionary input, _all_ keys created by the providedpartition_variablemust be specified (otherwise aMissingSortingVariableErrorwill be raised).backend – which visualization backend to use when plotting with the
plot()method.repeat_axes – unique values from
partition_variablecolumn data for which to create adjacent repeat axes. Repeat axes can be turned on for all unique values by setting this parameter toTrue. DefaultFalsesets no repeat axes.axes_order – order in which to place axes on the hive plot. Names must correspond to the unique values in node data specified by
partition_variable. If a list ofaxes_ordernames are provided and one of the names in the provided list isNone, then all remaining values unspecified in the provided list that are in the current partition as specified bypartition_variablewill be collapsed onto a single axis. This is particularly useful when the partition variable has more than 3 values. To change the name of the collapsed group in the final hive plot visualization, see thecollapsed_group_axis_nameparameter. DefaultNoneuses the order in thepandasgroupby object stored in the resultingpartitionattribute.rotation – angle (measured in degrees) to rotate every axis counterclockwise off of the default value. (By default, axes are evenly spaced in polar coordinates, with the first axis drawn at an angle of 0 degrees.)
angle_between_repeat_axes – angle (measured in degrees) to use between repeat axes.
axis_kwargs – nested dictionaries of specific kwargs to update axes. Keys should be unique values from
partition_variablecolumn data. Values should be dictionaries corresponding to the parameters inhiveplotlib.HivePlot.update_axis().all_edge_kwargs – additional keyword arguments for plotting all edges. Default
Nonemakes no additional modifications when plotting edges.clockwise_edge_kwargs – additional keyword arguments for plotting edges going clockwise. Default
Nonemakes no additional modifications when plotting edges.counterclockwise_edge_kwargs – additional keyword arguments for plotting edges going counterclockwise. Default
Nonemakes no additional modifications when plotting edges.repeat_edge_kwargs – additional keyword arguments for plotting edges between repeat axes. Default
Nonemakes no additional modifications when plotting edges.non_repeat_edge_kwargs – additional keyword arguments for plotting edges between non-repeat axes. Default
Nonemakes no additional modifications when plotting edges.warn_on_overlapping_kwargs – whether to warn if overlapping keyword arguments are detected among the
"all_edge_kwargs","repeat_edge_kwargs","non_repeat_edge_kwargs","clockwise_edge_kwargs", and"counterclockwise_edge_kwargs"parameters.num_steps_per_edge – how many steps to use in drawing each edge curve. Higher numbers will show smoother edges but take longer to compute and use more memory.
collapsed_group_axis_name – name of the axis corresponding to the multiple partition groups collapsed onto a single axis. Only used when
axes_orderincludes aNoneaxis.use_numba – whether to enable numba-accelerated Bézier curve sampling when constructing edges. Default
True. When enabled and numba is available, selection is automatic: serial numba for tiny or single-curve workloads (based on an internal floor), parallel numba otherwise.n_parallel – explicit maximum thread count to use during numba-parallel edge construction, limited by available CPU cores and the number of curves. If
None, uses all available CPU cores capped by the number of curves.
- Raises:
InvalidPartitionVariableError – if invalid
partition_variableprovided. This value must correspond to a column of the node data.MissingSortingVariableError – if any of the axes resulting from the choice of
partition_variabledoes not have a set sorting variable according to thesorting_variablesparameter.InvalidSortingVariableError – if the sorting variables chosen for one or more of the axes does not correspond to a column of the node data.
RepeatInPartitionAxisNameError – if one or more proposed axes set via the
partition_variableends in"_repeat", which is reserved for repeat axes.InvalidAxisNameError – if provided
axis_kwargspoints to an axis not in the resultingHivePlotinstance.InvalidAxesOrderError – if a non-
Noneaxes_orderincludes any names that do not correspond to the partition set via the providedpartition_variable.InvalidAxesOrderError – if user provides
Noneas one of the axes inaxes_orderbut there are no remaining unspecified names from the current partition to collapse onto this axis.
- build_axes(build_axes_from_scratch: bool = False, preserve_original_edge_kwargs: bool = False) None#
Build axes and place nodes corresponding to current partition.
- Parameters:
build_axes_from_scratch – if
True, then all old axes and edges will be deleted and new axes will be generated. This is useful for example when the partition variable is changed. Note, however, that this will drop any existing keyword arguments modifying the axes (e.g. manually changing angles, starting and ending axes positions, etc.).preserve_original_edge_kwargs – whether to preserve the original edge keyword arguments stored under the
hive_plot_edgesattribute.
- Returns:
None.
- build_hive_plot(build_axes_from_scratch: bool = False, preserve_original_edge_kwargs: bool = False) None#
Run all necessary computations to rebuild the underlying hive plot.
Note
Calling this method will kill any changes made to edges via the
hiveplotlib.HivePlot.update_edges()method (except for any plotting keyword arguments ifpreserve_original_edge_kwargs=True).- Parameters:
build_axes_from_scratch – if
True, old axes will be deleted and new axes will be generated. This is useful for example when the partition variable is changed. Note, however, that this will drop any existing keyword arguments modifying the axes (e.g. manually changing angles, starting and ending axes positions, etc.).preserve_original_edge_kwargs – whether to preserve the original edge keyword arguments stored under the
hive_plot_edgesattribute.
- Returns:
None.
- connect_adjacent_axes(rebuild_edges: bool = True, warn_on_overlapping_kwargs: bool | None = None, preserve_original_edge_kwargs: bool = False) None#
Connect all adjacent axes.
Note
This function call will reset all the existing edges, redrawing all the edges from scratch.
Calling this method will kill any changes made to edges via the
hiveplotlib.HivePlot.update_edges()method (except for any plotting keyword arguments ifpreserve_original_edge_kwargs=True).- Parameters:
rebuild_edges – whether to only update edge kwargs or to also redraw the edges. Default
Truealso rebuilds edges.warn_on_overlapping_kwargs – whether to warn if overlapping keyword arguments are detected among the
"all_edge_kwargs","repeat_edge_kwargs","clockwise_edge_kwargs", and"counterclockwise_edge_kwargs"parameters. DefaultNonefalls back to the value set by thewarn_on_overlapping_kwargsattribute.preserve_original_edge_kwargs – whether to preserve the original edge keyword arguments stored under the
hive_plot_edgesattribute.
- property edge_kwarg_hierarchy: list[Literal['all_edge_kwargs', 'repeat_edge_kwargs', 'non_repeat_edge_kwargs', 'clockwise_edge_kwargs', 'counterclockwise_edge_kwargs']]#
Return the current
edge_kwarg_hierarchylist, specified from least prioritized to most prioritized.
- plot(**kwargs)#
Plot underlying hive plot.
Note
When the backend is set to
datashader, any provided node plotting keyword arguments innodes.node_viz_kwargswill be disregarded, as attributes like color and size are reserved for datashading the nodes. Inclusion of anynode_kwargshere will also raise a warning.When the backend is set to
datashader, any provided edge plotting keyword arguments inedges.edge_viz_kwargswill be disregarded, as attributes like color and size are reserved for datashading the edges. Inclusion of any edge kwargs here as part of the additionalim_kwargs(discussed further in the docstring fordatashade_hive_plot_mpl()) will likely trigger an error.- Parameters:
kwargs – keyword arguments for the appropriate
hive_plot_viz()call, depending on which viz backend is currently set. See the Visualization module documentation for more information on possible arguments. Other than different backends having different names for equivalent keyword arguments, these should for the most part be interchangeable, with the exception of thedatashaderbackend (see note above).- Returns:
viz data structures, see the appropriate
hive_plot_viz()call corresponding to the current viz backend for more information here.
- rename_edge_kwargs(**rename_kwargs) None#
Rename specific edge kwarg names.
This will operate on all of the possible edge kwarg settings stored in the
edge_plotting_keyword_argumentsattribute and edge kwargs stored in thehive_plot_edgesThis allows users to quickly accommodate different visualization back ends that may require different keyword argument names.Note
Not all edge keyword arguments are supported by all back ends. For example, some back ends may not support the
zorderconcept inmatplotlibto reorder the plotting of edges independently of the order in which they were plotted. In this case, users can *remove these keyword arguments entirely by providing{old_name: None}in therename_kwargsparameter.- Parameters:
rename_kwargs – dictionary that will map old keyword argument names to new keyword argument names. This will operate on all of the possible edge kwarg settings stored in the
edge_plotting_keyword_argumentsattribute and edge kwargs stored in thehive_plot_edgesThis allows users to quickly accommodate different visualization back ends that may require different keyword argument names. To remove an incompatible edge kwarg, provide{old_name: None}in the dictionary.
- set_angle_between_repeat_axes(angle: float = 40, build_hive_plot: bool = True, preserve_original_edge_kwargs: bool = True) None#
Set the angle (in degrees) between repeat axes.
- Parameters:
angle – angle (in degrees) to use between repeat axes.
build_hive_plot – whether to rebuild the hive plot (i.e. redraw edges). This computation is usually desired, but users can save extra computation if running multiple setter methods by saving rebuilding for the last setter call.
preserve_original_edge_kwargs – whether to preserve the original edge keyword arguments stored under the
hive_plot_edgesattribute.
- Returns:
None.
- set_axes_order(axes: List[Hashable | None] | ndarray | None = None, collapsed_group_axis_name: str | None = None, collapsed_group_sorting_variable: Hashable | None = None, build_hive_plot: bool = True, preserve_original_edge_kwargs: bool = True, check_collapsed_group_sorting_variable: bool = True, require_using_all_partition_names: bool = True) None#
Set order of axes to be plotted in counterclockwise order.
Names must correspond to the unique values in node data specified by the
partition_variableattribute, or users can provideNoneas one of the axes to collapse any unspecified groups from the partition onto a single axis.Default
Noneuses the order in thepandasgroupby object stored in thepartitionattribute.Note
If a user is trying to set a subset of the partition names under the
axesparameter (without aNonecollapsing axis), then the user should instead callset_partition()with the desiredaxes_ordersubset of names.- Parameters:
axes – unique names available in the column of data corresponding to the
partition_variableattribute. Names must correspond to the unique values in node data specified by the currentpartition_variable. If a list ofaxesnames are provided and one of the names in the provided list isNone, then all remaining values unspecified in the provided list that are in the current partition as specified bypartition_variablewill be collapsed onto a single axis. This is particularly useful when the partition variable has more than 3 values. To change the name of the collapsed group in the final hive plot visualization, see thecollapsed_group_axis_nameparameter. DefaultNoneuses the order in thepandasgroupby object stored in the resultingpartitionattribute.collapsed_group_axis_name – name of the axis corresponding to the multiple partition groups collapsed onto a single axis. Only used when
axes_orderincludes aNoneaxis. DefaultNoneuses the name stored under thecollapsed_group_axis_nameattribute.collapsed_group_sorting_variable – sorting variable to use for the collapsed group axis. If not provided, and a value is not available in the
sorting_variablesattribute, then aMissingSortingVariableErrorwill be raised.build_hive_plot – whether to rebuild the hive plot (i.e. redraw edges). This computation is usually desired, but users can save extra computation if running multiple setter methods by saving rebuilding for the last setter call.
preserve_original_edge_kwargs – whether to preserve the original edge keyword arguments stored under the
hive_plot_edgesattribute.check_collapsed_group_sorting_variable – whether to check if a collapsed group sorting variable exists.
require_using_all_partition_names – whether to require that the user provides all partition names in the
axesparameter. IfTrue, then the user must provide all partition names, or at least provideNoneto collapse any unspecified groups onto a single axis. If trying to set a subset of the partition names, then the user should instead callset_partition()with the desiredaxes_order.
- Returns:
None.- Raises:
InvalidAxesOrderError – if non-
Noneaxesparameter provides names outside of the current partition.InvalidAxesOrderError – if user provides a strict subset of partition axes values and
require_using_all_partition_names=True.InvalidAxesOrderError – if user provides
Noneas one of the axes but there are no remaining unspecified names from the current partition to collapse onto this axis.MissingSortingVariableError – if the sorting variable for the collapsed group axis is not provided and a value is not available for the collapsed group axis under the
sorting_variablesattribute. Note, check only runs ifcheck_collapsed_group_sorting_variableis True.
- set_partition(partition_variable: Hashable, sorting_variables: Hashable | Dict[Hashable, Hashable], repeat_axes: bool | Hashable | List[Hashable] = False, axes_order: List[Hashable] | ndarray | None = None, collapsed_group_axis_name: str = 'Other', build_hive_plot: bool = True) None#
Set the node partition variable, create the necessary axes, and place nodes on the axes accordingly.
Note
This call will remove any existing axes.
- Parameters:
partition_variable – node partition variable to use.
sorting_variables – sorting variable(s) to use for axes. Can specify a single value to use for all axes or a dictionary with axis name keys and sorting variable values to assign specific sorting variables to individual axes. Repeat axes can be specified by specifying the resulting axis name, which will be
"<partition_value>_repeat"for whatever<partition_value>to which an axis corresponds.repeat_axes – axes names for which to create repeat axes. Providing
Truehere turns on all possible axes specified via thepartitionattribute.Falseor[]turns off all repeat axes.axes_order – unique names available in the column of data corresponding to the
partition_variableattribute. Names must correspond to the unique values in node data specified by the currentpartition_variable. If a list ofaxes_ordernames are provided and one of the names in the provided list isNone, then all remaining values unspecified in the provided list that are in the current partition as specified bypartition_variablewill be collapsed onto a single axis. This is particularly useful when the partition variable has more than 3 values. To change the name of the collapsed group in the final hive plot visualization, see thecollapsed_group_axis_nameparameter. DefaultNoneuses the order in thepandasgroupby object stored in the resultingpartitionattribute.collapsed_group_axis_name – name of the axis corresponding to the multiple partition groups collapsed onto a single axis. Only used when
axes_orderincludes aNoneaxis.build_hive_plot – whether to rebuild the hive plot (i.e. recompute axes and redraw edges). This computation is usually desired, but users can save extra computation if running multiple setter methods by saving rebuilding for the last setter call.
- Returns:
None.- Raises:
InvalidPartitionVariableError – if invalid
partition_variableprovided.RepeatInPartitionAxisNameError – if one or more implied axes names from the given partition would end in
"_repeat". This naming convention is reserved for repeat axes.InvalidAxesOrderError – if non-
Noneaxes_orderparameter provides names outside of the current partition.InvalidAxesOrderError – if user provides
Noneas one of the axes but there are no remaining unspecified names from the current partition to collapse onto this axis.
- set_repeat_axes(axes_names: bool | Hashable | List[Hashable], sorting_variables: Hashable | Dict[Hashable, Hashable] | None = None, build_hive_plot: bool = True, preserve_original_edge_kwargs: bool = True) None#
Set repeat axes for specified axes names.
Note
This method will overwrite existing repeat axes specifications. Thus, rerunning this method will remove any repeat axes not specified in the call. See the example code below.
If a necessary repeat axis sorting variable is not provided by the user, then this method will use the sorting variable from the corresponding non-repeat axis.
Any existing repeat axes can be removed by setting
axes_namestoFalseor[].from hiveplotlib.datasets import example_hive_plot hp = example_hive_plot() list(hp.axes.keys()) >>> ['A', 'B', 'C'] # adds 'A_repeat' hp.set_repeat_axes("A") list(hp.axes.keys()) >>> ['A', 'B', 'C', 'A_repeat'] # removes 'A_repeat', adds 'B_repeat' hp.set_repeat_axes("B") list(hp.axes.keys()) >>> ['B', 'C', 'A', 'B_repeat']
- Parameters:
axes_names – axes names for which to create repeat axes. Providing
Truehere turns on all possible axes specified via thepartitionattribute.Falseor[]turns off all repeat axes.sorting_variables – sorting variables to choose for the axis / axes.
build_hive_plot – whether to rebuild the hive plot (i.e. redraw edges). This computation is usually desired, but users can save extra computation if running multiple setter methods by saving rebuilding for the last setter call.
preserve_original_edge_kwargs – whether to preserve the original edge keyword arguments stored under the
hive_plot_edgesattribute.
- Raises:
AssertionError – if one or more proposed axes are not in the current partition.
RepeatInPartitionAxisNameError – if one or more proposed axes for which to add a repeat ends in
"_repeat".
- Returns:
None.
- set_rotation(rotation: float, build_hive_plot: bool = True, preserve_original_edge_kwargs: bool = True) None#
Rotate all axes counterclockwise relative to the default placement, then reconstruct axes and edges accordingly.
By default, axes are equally spaced in polar coordinates, with the first axis placed at an angle of 0 degrees.
Changing the rotation angle will rotate every axis counterclockwise by the provided
rotationvalue (measured in degrees).- Parameters:
rotation – angle (measured in degrees) to rotate every axis counterclockwise off of the default value. (By default, axes are evenly spaced in polar coordinates, with the first axis drawn at an angle of 0 degrees.)
build_hive_plot – whether to rebuild the hive plot (i.e. redraw edges). This computation is usually desired, but users can save extra computation if running multiple setter methods by saving rebuilding for the last setter call.
preserve_original_edge_kwargs – whether to preserve the original edge keyword arguments stored under the
hive_plot_edgesattribute.
- Returns:
None.
- set_viz_backend(backend: Literal['bokeh', 'datashader', 'holoviews-bokeh', 'holoviews-matplotlib', 'matplotlib', 'plotly']) None#
Set viz backend for plotting.
- Parameters:
backend – which viz backend to use for plotting.
- Raises:
AssertionError – if user tries to set an unsupported viz backend.
- Returns:
None.
- to_json() str#
Return the plotting information from the axes, nodes, and edges in Cartesian space as a serialized JSON string.
This allows users to visualize hive plots with arbitrary libraries, even outside of python.
The dictionary structure of the resulting JSON will consist of three top-level keys:
“axes” - contains the information for plotting each axis (including angle and long_name), plus the nodes on each axis in Cartesian space.
“edges” - contains the information for plotting the discretized edges in Cartesian space, plus the corresponding to and from IDs that go with each edge, as well as any kwargs that were set for plotting each set of edges.
“node_viz_kwargs” - contains the resolved node visualization keyword arguments for each axis. Column references in
hiveplotlib.NodeCollection.node_viz_kwargsare resolved to per-node arrays.Note
The resulting JSON will not contain the additional data for the nodes or edges stored under the
nodesandedgesattributes, respectively. It will only the Cartesian coordinates of the nodes and the discretized curves of the edges.- Returns:
JSON output of axis, node, and edge information.
- update_axis(axis_id: Hashable, sorting_variable: Hashable | None = None, vmin: float | None | Literal['unchanged'] = 'unchanged', vmax: float | None | Literal['unchanged'] = 'unchanged', start: float | None = None, end: float | None = None, angle: float | None = None, long_name: Hashable | None = None, preserve_original_edge_kwargs: bool = True, build_hive_plot: bool = True) None#
Update existing axis parameters.
Allows updating axis size, axis placement in cartesian space, the long name for axis labeling during plotting, node sorting, and positioning nodes on the axis.
When running on a given axis, any unspecified parameters will remain unchanged from the axis’ original values.
Note
If a
sorting_variableparameter is provided, and the axis was previously inferring the vmin / vmax, then the default behavior of thevmin/vmaxparameter, if not provided, will be to re-determine the global minimum / maximum for the new feature values (i.e. as if the parameter were set toNone).- Parameters:
axis_id – unique name for
Axisinstance.sorting_variable – node sorting variable to use. Default
Nonemaintains existing sorting variable. If thevminand / orvmaxvalue was previously inferred, then it will be re-inferred according to the global min and / or max values of this new sorting variable.vmin – all values less than
vminwill be set tovmin.Noneinfers and sets as global minimum of feature values for allNodeinstances on specifiedAxis. If thevminvalue was explicitly set beforehand by the user or thesorting_variablewas left unchanged, then the default value"unchanged"will use the samevminvalue as before. However, if thesorting_variableparameter was changed and thevminvalue was previously inferred, then by default, the global minimum will be re-determined for the revisedsorting_variablevalues, as done when set toNone.vmax – all values greater than
vmaxwill be set tovmax.Nonesets as global maximum of feature values for allNodeinstances on specifiedAxis. If thevmaxvalue was explicitly set beforehand by the user or thesorting_variablewas left unchanged, then the default value"unchanged"will use the samevmaxvalue as before. However, if thesorting_variableparameter was changed and thevmaxvalue was previously inferred, then by default, the global maximum will be re-determined for the revisedsorting_variablevalues, as done when set toNone.start – point closest to the center of the plot (using the same positive number for multiple axes in a hive plot is a nice way to space out the figure). Default
Nonemaintains existing start position.end – point farthest from the center of the plot. Default
Nonemaintains existing ending position.angle – angle to set the axis, in degrees (moving counterclockwise, e.g. 0 degrees points East, 90 degrees points North). Default
Nonemaintains existing angle.long_name – longer name for use when labeling on graph (but not for referencing the axis). Default
Nonesets it toaxis_id. DefaultNonemaintains existing long name.preserve_original_edge_kwargs – whether to preserve the original edge keyword arguments stored under the
hive_plot_edgesattribute.build_hive_plot – whether to rebuild the hive plot (i.e. redraw edges). This computation is usually desired, but users can save extra computation if running multiple setter methods by saving rebuilding for the last setter call.
- Raises:
AssertionError – if provided
axis_idnot an existing axis under theaxesattribute.- Returns:
None.
- update_edge_plotting_keyword_arguments(edge_kwarg_setting: Literal['all_edge_kwargs', 'repeat_edge_kwargs', 'non_repeat_edge_kwargs', 'clockwise_edge_kwargs', 'counterclockwise_edge_kwargs'] = 'all_edge_kwargs', reset_edge_kwarg_setting: bool = False, rebuild_edges: bool = False, **kwargs) dict#
Update the edge keyword arguments for a specific
edge_kwarg_setting.- Parameters:
edge_kwarg_setting – which edge kwarg setting to modify.
reset_edge_kwarg_setting – whether to overwrite existing keyword arguments for the chosen
edge_kwarg_setting.rebuild_edges – whether to only update edge kwargs or to also redraw the edges. Default
Falseonly updates edge kwargs.kwargs – additional keyword arguments to provide to the specified edge kwarg setting.
- Returns:
dictionary of the resulting keyword arguments for that edge kwarg setting.
- update_edges(partition_id_1: Hashable, partition_id_2: Hashable, tag: Hashable | None = None, p1_to_p2: bool = True, p2_to_p1: bool = True, short_arc: bool | None = None, control_rho_scale: float | None = None, control_angle_shift: float | None = None, reset_existing_kwargs: bool = False, overwrite_existing_kwargs: bool = True, **edge_kwargs) None#
Modify all existing edges between a pair of partition groups.
This method allows changing edge construction parameters and / or edge visualization keyword arguments.
Note
This method also allows for modification of edges in just one direction between the two provided partition groups by specifying p1_to_p2 or p2_to_p1 as False (both are True by default).
Any updates done via this method will be lost if one calls the
hiveplotlib.HivePlot.build_hive_plot()method.- Parameters:
partition_id_1 – Hashable pointer to the first group in the current partition between which we want to modify connections.
axis_id_2 – Hashable pointer to the second group in the current partition between which we want to modify connections.
tag – unique ID specifying which tag of edges to modify. Note, if no tag is specified (e.g.
tag=None), it is presumed there is only one tag for the specified set of partition IDs to look over, which can be inferred. If no tag is specified and there are multiple tags to choose from, anUnspecifiedTagErrorwill be raised.p1_to_p2 – whether to modify connections going FROM
partition_id_1TOpartition_id_2.p2_to_p1 – whether to modify connections going FROM
partition_id_2TOpartition_id_1.short_arc – whether to take the shorter angle arc (
True) or longer angle arc (False). When not set, uses a default valueTrue. There are always two ways to traverse between axes: with one angle being x, the other option being 360 - x. For most visualizations, the user should expect to traverse the “short arc,” hence the defaultTrue. For full user flexibility, however, we offer the ability to force the arc the other direction, the “long arc” (short_arc=False). Note: in the case of 2 axes 180 degrees apart, there is no “wrong” angle, so in this case an initial decision will be made, but switching this boolean will switch the arc to the other hemisphere.control_rho_scale – how much to multiply the distance of the control point for each edge to / from the origin. When not set, uses a default value
1, which sets the control rho for each edge as the mean rho value for each pair of nodes being connected by that edge. A value greater than 1 will pull the resulting edges further away from the origin, making edges more convex, while a value between 0 and 1 will pull the resulting edges closer to the origin, making edges more concave. Note, this affects edges further from the origin by larger magnitudes than edges closer to the origin.control_angle_shift – how far to rotate the control point for each edge around the origin. When not set, uses a default value
0, which sets the control angle for each edge as the mean polar angle for each pair of nodes being connected by that edge. A positive value will pull the resulting edges further counterclockwise, while a negative value will pull the resulting edges further clockwise.reset_existing_kwargs – whether to delete existing edge kwargs stored in the
hive_plot_edgesattribute for the specified edges, default leaves the existing edge kwargs unchanged, overwriting any provided kwargs accordingly.overwrite_existing_kwargs – whether to overwrite existing edge kwargs stored in the
hive_plot_edgesattribute for the specified edges when also provided inedge_kwargs, default True.edge_kwargs – additional params that will be applied to the related edges.
- Returns:
None.- Raises:
InvalidPartitionVariableError – if invalid partition variables provided with respect to the current partition.
UnspecifiedTagError – if no tag is specified and there are multiple tags available.
- update_node_viz_kwargs(reset_kwargs: bool = False, **node_viz_kwargs) None#
Update keyword arguments for plotting nodes in a
node_viz()call.Users can either provide values in two ways.
1. By providing a string value corresponding to a column name, in which case that column data would be used for that plotting keyword argument in a
node_viz()call.2. By providing explicit keyword arguments (e.g.
cmap="viridis"), in which case that keyword argument would be used as-is in anode_viz()call.Note
Provided keyword argument values will be checked first against column names in
nodes.data(i.e. (1) above) before falling back to (2) and setting the keyword argument explicitly.The appropriate keyword argument names should be chosen as a function of your choice of visualization back end (e.g.
matplotlib,bokeh,datashader, etc.).This is a wrapper method for calling
hiveplotlib.NodeCollection.update_node_viz_kwargs()on the underlyingnodesattribute.- Parameters:
reset_kwargs – whether to drop the existing keyword arguments before adding the provided keyword arguments to the
node_viz_kwargsattribute. Existing values are preserved by default (i.e.reset_kwargs=False).node_viz_kwargs – keyword arguments to provide to a
node_viz()call. Users can provide names according to column names in thedataattribute or explicit values, as discussed in (1) and (2) above.
- Returns:
None.
- update_partition_data() None#
Update the partition data based on the current node data.
This method is useful when the node data has changed, which means the partition needs to be recalculated and the resulting new data propagating to the axes.
this method will reset the partition and update the axes accordingly.
- Returns:
None.
- update_sorting_variables(sorting_variables: Hashable | Dict[Hashable, Hashable], reset_vmin_and_vmax: bool = True, build_hive_plot: bool = True, preserve_original_edge_kwargs: bool = True) None#
Update sorting variables for specified axes with the current partition.
- Parameters:
sorting_variables – sorting variable(s) to use for axes. Can specify a single value to use for all axes or a dictionary with axis name keys and sorting variable values to assign specific sorting variables to individual axes. Unless overwriting all current sorting variables, previously set sorting variables will be preserved.
reset_vmin_and_vmax – if True, then setting a sorting variable for an axis / axes will throw out any existing
vmin/vmaxinformation, reinitializing to infer and span the full extent of data (i.e.vmin=Noneandvmax=None).build_hive_plot – whether to rebuild the hive plot (i.e. redraw edges). This computation is usually desired, but users can save extra computation if running multiple setter methods by saving rebuilding for the last setter call.
preserve_original_edge_kwargs – whether to preserve the original edge keyword arguments stored under the
hive_plot_edgesattribute.
- Raises:
MissingSortingVariableError – if not all of the current partition axes have been specified with a sorting variable (either from the current call or from earlier, either with another call to this method or by setting
sorting_variableson initialization of theHivePlotinstance.InvalidSortingVariableError – if the sorting variables chosen for one or more of the axes does not correspond to a column of the node data.
- Returns:
None.
Note
If specifying a dictionary of
sorting_variablesinformation, any axes keys excluded from the provided dictionary will be unaffected, each keeping its existing sorting variable.Repeat axes can be specified by specifying the repeat axis name, which will be
"<partition_value>_repeat"for whatever<partition_value>to which an axis corresponds.Providing an invalid sorting variable value will raise a
InvalidSortingVariableError.A
Hashableinput will set the sorting variable of all possible axes with the currentpartitionattribute, including all possible repeat axes (whether plotted or not), to use the provided sorting variable. Any sorting variables set for a previous partition axis will be preserved.If
reset_vmin_and_vmax=True, then setting a sorting variable for an axis will throw out any existingvmin/vmaxinformation for the provided axis / axes, reinitializing to infer and span the full extent of data (i.e.vmin=Noneandvmax=None).Providing a nonexistent axis key will not raise any error. Instead, the sorting variable for the nonexistent axis will be stored in the
sorting_variablesattribute dictionary, leaving current axes unaffected. This allows users to set sorting variables for multiple partitions at once without setting the sorting variables everytime the partition variable is changed.
- hiveplotlib.hiveplot.supported_viz_backends()#
Return the supported visualization back ends for
hiveplotlibhive plots.