Toy Hive Plots#

Tools for generating toy hive plots.

hiveplotlib.datasets.toy_hive_plots.example_base_hive_plot(num_nodes: int = 15, num_edges: int = 30, seed: int = 0, **hive_plot_n_axes_kwargs) → BaseHivePlot#

Generate example hive plot with "Low", "Medium", and "High" axes (plus repeat axes).

Nodes and edges will be generated and placed randomly.

Parameters:

num_nodes – number of nodes to generate.
num_edges – number of edges to generate.
seed – random seed to use when generating nodes and edges.
hive_plot_n_axes_kwargs – additional keyword arguments for the underlying hiveplotlib.hive_plot_n_axes() call.

Returns:

resulting BaseHivePlot instance.

hiveplotlib.datasets.toy_hive_plots.example_edge_data(nodes: NodeCollection, num_edges: int = 100, from_column_name: Hashable = 'from', to_column_name: Hashable = 'to', seed: int = 0) → DataFrame#

Generate example edge data from a provided NodeCollection.

Parameters:

nodes – nodes from which to generate example edges.
num_edges – how many example edges to randomly generate.
from_column_name – name to assign to the edge origin column, whose values correspond to node IDs where a given edge starts.
to_column_name – name to assign to the edge destination column, whose values correspond to node IDs where a given edge ends.
seed – random seed to use when randomly generating edge data.

Returns:

random edge data as (n, 2) DataFrame of [from, to] edges.

hiveplotlib.datasets.toy_hive_plots.example_edges(nodes: NodeCollection, num_edges: int = 100, from_column_name: Hashable = 'from', to_column_name: Hashable = 'to', seed: int = 0) → Edges#

Generate example edges from a provided NodeCollection.

Parameters:

nodes – nodes from which to generate example edges.
num_edges – how many example edges to randomly generate.
from_column_name – name to assign to the edge origin column, whose values correspond to node IDs where a given edge starts.
to_column_name – name to assign to the edge destination column, whose values correspond to node IDs where a given edge ends.
seed – random seed to use when randomly generating edge data.

Returns:

random edge data.

hiveplotlib.datasets.toy_hive_plots.example_hive_plot(num_nodes: int = 100, num_edges: int = 100, partition_data_column: Literal['low', 'med', 'high'] = 'low', labels: List[Hashable] | None = ('A', 'B', 'C'), cutoffs: List[float] | int | None = 3, partition_variable_name: Hashable | None = None, sorting_variables: Hashable | Dict[Hashable, Hashable] = 'low', seed: int = 0, node_unique_id_column: str = 'unique_id', **hive_plot_kwargs) → HivePlot#

Generate example HivePlot instance.

Each node will have a "low", "med", and "high" value, where these values are randomly generated, and as the names suggest, for the resulting values of each node, "low" < "med" < "high".

Each edge will also have a "low", "med", and "high" value, with each value being the average “low” / “med” / “high” level of the two nodes composing the edge.

Note

The generated num_edges edges will be randomly generated between all possible axes, including repeat axes. Thus, calling this function without requesting all repeat axes (i.e. repeat_axes=True) will result in less than num_edges edges visualized in the final hive plot. (All generated edges will be stored in the resulting hive_plot.edges, even though some will not be plotted if excluding repeat axes in the plot.)

Parameters:

num_nodes – how many nodes to randomly generate. Node unique IDs will be the integers 0, 1, … , num_nodes - 1.
num_edges – how many example edges to randomly generate.
partition_data_column – which column of data in the underlying data attribute to use to partition the node data. Node data generated via hiveplotlib.datasets.toy_hive_plots.example_node_data().
labels – labels assigned to each bin. Only referenced when cutoffs is not None. None labels each bin as a string based on its range of values. Note, when cutoffs is a list, len(labels) must be 1 greater than len(cutoffs). When cutoffs is an int, len(labels) must be equal to cutoffs.
cutoffs – cutoffs to use in binning nodes according to data under partition_data_column. Default None will bin nodes by unique values of partition_data_column. When provided as a list, the specified cutoffs will bin according to (-inf, cutoffs[0]], (cutoffs[0], cutoffs[1]], … , (cutoffs[-1], inf). When provided as an int, the exact numerical break points will be determined to create cutoffs equally-sized quantiles.
partition_variable_name – name of the resulting partition variable to add to the nodes.data attribute of the resulting HivePlot instance. Default None will name the partition column as "partition_0".
sorting_variable – which node variable to use to sort / place the nodes on each axis. Providing a single value uses the same variable for each axis. Alternatively, providing a dictionary of keys as the unique values from partition_variable_name column data in the nodes.data attribute and values being the corresponding sorting variable to use for that axis.
seed – random seed to use when randomly generating node and edge data.
node_unique_id_column – name to assign to the column in the nodes.data attribute that corresponds to the unique IDs.
hive_plot_kwargs – additional keyword arguments when creating the returned hiveplotlib.HivePlot() instance.

Returns:

randomly-generated HivePlot instance.

hiveplotlib.datasets.toy_hive_plots.example_node_collection(num_nodes: int = 100, seed: int = 0, unique_id_column: str = 'unique_id') → NodeCollection#

Generate example NodeCollection.

Each node will have a "low", "med", and "high" value, where these values are randomly generated, and as the names suggest, for the resulting values of each node, "low" < "med" < "high".

Unique ID column will be given the name "unique_id".

Parameters:

num_nodes – how many nodes to randomly generate. Node unique IDs will be the integers 0, 1, … , num_nodes - 1.
seed – random seed to use when randomly generating node data.
unique_id_column – name to assign to the column in the resulting NodeCollection.data attribute that corresponds to the unique IDs.

Returns:

NodeCollection of node data.

hiveplotlib.datasets.toy_hive_plots.example_node_data(num_nodes: int = 100, seed: int = 0) → DataFrame#

Generate example node dataframe.

Each node will have a "low", "med", and "high" value, where these values are randomly generated, and as the names suggest, for the resulting values of each node, "low" < "med" < "high".

Unique ID column will be given the name "unique_id".

Parameters:

num_nodes – how many nodes to randomly generate. Node unique IDs will be the integers 0, 1, … , num_nodes - 1.
seed – random seed to use when randomly generating node data.

Returns:

dataframe of node data.

hiveplotlib.datasets.toy_hive_plots.example_nodes_and_edges(num_nodes: int = 100, num_edges: int = 200, num_axes: int = 3, seed: int = 0) → Tuple[List[Node], List[List[Hashable]], ndarray]#

Generate example nodes, node splits (one list of nodes per intended axis), and edges.

Each node will have a "low", "med", and "high" value, where these values are randomly generated, and as the names suggest, for the resulting values of each node, "low" < "med" < "high".

Parameters:

num_nodes – how many nodes to randomly generate. Node unique IDs will be the integers 0, 1, … , num_nodes - 1.
num_edges – how many edges to randomly generate.
num_axes – how many axes into which to partition the randomly generated nodes.
seed – random seed to use when randomly generating node and edge data.

Returns:

list of generated Node instances, a list of num_axes lists that evenly split the node IDs to be allocated to their own axes, and a (num_edges, 2) shaped array of random edges between nodes.