Toy Hive Plots#

Tools for generating toy hive plots.

hiveplotlib.datasets.toy_hive_plots.example_base_hive_plot(num_nodes: int = 15, num_edges: int = 30, seed: int = 0, **hive_plot_n_axes_kwargs) BaseHivePlot#

Generate example hive plot with "Low", "Medium", and "High" axes (plus repeat axes).

Nodes and edges will be generated and placed randomly.

Parameters:
  • num_nodes – number of nodes to generate.

  • num_edges – number of edges to generate.

  • seed – random seed to use when generating nodes and edges.

  • hive_plot_n_axes_kwargs – additional keyword arguments for the underlying hiveplotlib.hive_plot_n_axes() call.

Returns:

resulting BaseHivePlot instance.

hiveplotlib.datasets.toy_hive_plots.example_edge_data(nodes: NodeCollection, num_edges: int = 100, from_column_name: Hashable = 'from', to_column_name: Hashable = 'to', seed: int = 0) DataFrame#

Generate example edge data from a provided NodeCollection.

Parameters:
  • nodes – nodes from which to generate example edges.

  • num_edges – how many example edges to randomly generate.

  • from_column_name – name to assign to the edge origin column, whose values correspond to node IDs where a given edge starts.

  • to_column_name – name to assign to the edge destination column, whose values correspond to node IDs where a given edge ends.

  • seed – random seed to use when randomly generating edge data.

Returns:

random edge data as (n, 2) DataFrame of [from, to] edges.

hiveplotlib.datasets.toy_hive_plots.example_edges(nodes: NodeCollection, num_edges: int = 100, from_column_name: Hashable = 'from', to_column_name: Hashable = 'to', seed: int = 0) Edges#

Generate example edges from a provided NodeCollection.

Parameters:
  • nodes – nodes from which to generate example edges.

  • num_edges – how many example edges to randomly generate.

  • from_column_name – name to assign to the edge origin column, whose values correspond to node IDs where a given edge starts.

  • to_column_name – name to assign to the edge destination column, whose values correspond to node IDs where a given edge ends.

  • seed – random seed to use when randomly generating edge data.

Returns:

random edge data.

hiveplotlib.datasets.toy_hive_plots.example_full_kwargs_hive_plot() HivePlot#

Generate a HivePlot instance with data-dependent edge kwargs, colormaps, and node styling.

Uses example_hive_plot() with 9 nodes, 10 edges (seed 99), and repeat_axes=True. Scaled edge data columns are created for alpha and linewidth, and all edge visualization kwargs are set via update_edge_plotting_keyword_arguments() using column name references:

  • color: the "low" edge data column mapped through a "cividis" colormap.

  • alpha: the "alpha_scaled" column (low / 10, yielding values in ~0.3-1.0).

  • linewidth: the "lw_scaled" column (low / 3, yielding values in ~1.0-3.3).

Node styling is applied via update_node_viz_kwargs with a "viridis" colormap mapped to the "low" data column.

Returns:

HivePlot instance with node and edge kwargs.

hiveplotlib.datasets.toy_hive_plots.example_hive_plot(num_nodes: int = 100, num_edges: int = 100, partition_data_column: Literal['low', 'med', 'high'] = 'low', labels: List[Hashable] | None = ['A', 'B', 'C'], cutoffs: List[int | float] | int = 3, partition_variable_name: Hashable | None = None, sorting_variables: Hashable | Dict[Hashable, Hashable] = 'low', seed: int = 0, node_unique_id_column: str = 'unique_id', **hive_plot_kwargs) HivePlot#

Generate example HivePlot instance.

Each node will have a "low", "med", and "high" value, where these values are randomly generated, and as the names suggest, for the resulting values of each node, "low" < "med" < "high".

Each edge will also have a "low", "med", and "high" value, with each value being the average “low” / “med” / “high” level of the two nodes composing the edge.

Note

The generated num_edges edges will be randomly generated between all possible axes, including repeat axes. Thus, calling this function without requesting all repeat axes (i.e. repeat_axes=True) will result in less than num_edges edges visualized in the final hive plot. (All generated edges will be stored in the resulting hive_plot.edges, even though some will not be plotted if excluding repeat axes in the plot.)

Parameters:
  • num_nodes – how many nodes to randomly generate. Node unique IDs will be the integers 0, 1, … , num_nodes - 1.

  • num_edges – how many example edges to randomly generate.

  • partition_data_column – which column of data in the underlying data attribute to use to partition the node data. Node data generated via hiveplotlib.datasets.toy_hive_plots.example_node_data().

  • labels – labels assigned to each bin. Only referenced when cutoffs is not None. None labels each bin as a string based on its range of values. Note, when cutoffs is a list, len(labels) must be 1 greater than len(cutoffs). When cutoffs is an int, len(labels) must be equal to cutoffs.

  • cutoffs – cutoffs to use in binning nodes according to data under partition_data_column. Default None will bin nodes by unique values of partition_data_column. When provided as a list, the specified cutoffs will bin according to (-inf, cutoffs[0]], (cutoffs[0], cutoffs[1]], … , (cutoffs[-1], inf). When provided as an int, the exact numerical break points will be determined to create cutoffs equally-sized quantiles.

  • partition_variable_name – name of the resulting partition variable to add to the nodes.data attribute of the resulting HivePlot instance. Default None will name the partition column as "partition_0".

  • sorting_variable – which node variable to use to sort / place the nodes on each axis. Providing a single value uses the same variable for each axis. Alternatively, providing a dictionary of keys as the unique values from partition_variable_name column data in the nodes.data attribute and values being the corresponding sorting variable to use for that axis.

  • seed – random seed to use when randomly generating node and edge data.

  • node_unique_id_column – name to assign to the column in the nodes.data attribute that corresponds to the unique IDs.

  • hive_plot_kwargs – additional keyword arguments when creating the returned hiveplotlib.HivePlot() instance.

Returns:

randomly-generated HivePlot instance.

hiveplotlib.datasets.toy_hive_plots.example_hpm_nodes_and_edges(num_groups: int = 3, nodes_per_group: int = 10, edge_tag_counts: Dict[str, int] | None = None, edge_structure: Dict[str, str] | None = None, seed: int = 42) Tuple[NodeCollection, Edges]#

Generate a shared toy dataset that works well with HivePlotMatrix examples.

Returns a NodeCollection of num_groups * nodes_per_group nodes across num_groups groups (A, B, C, …) with structured numeric columns, and a multi-tag Edges object.

Node columns:

  • "unique_id": integers 0 to (num_groups * nodes_per_group - 1).

  • "group": group label, assigned in consecutive blocks. With num_groups=3, nodes 0-9 are "A", 10-19 are "B", 20-29 are "C".

  • "value1": correlated with group - each group draws from a distinct sub-interval of [0, 10] in ascending order. Sorting by value1 places groups at distinct, separated positions along each axis if setting axes vmin and vmax values to the same fixed range.

  • "value2": uncorrelated noise - all nodes draw from Uniform(0, 10). Sorting by value2 produces little visible group separation.

  • "value3": inversely correlated with group - the mirror image of value1. The first group draws from the highest sub-interval, the last from the lowest.

Default edge tags (decreasing density):

  • "official": 40 edges.

  • "social": 30 edges.

  • "informal": 20 edges.

Parameters:
  • num_groups – number of groups to generate (A, B, C, …). Default 3.

  • nodes_per_group – number of nodes in each group. Default 10.

  • edge_tag_counts – dictionary mapping edge tag names to edge counts. Defaults to {"official": 40, "social": 30, "informal": 20}.

  • edge_structure

    dictionary mapping edge tag names to structure types. Supported types:

    • "random": edges connect any two nodes uniformly at random (default).

    • "intragroup": edges connect nodes within the same group only.

    • "intergroup": edges connect nodes in different groups only.

    Tags not listed in this dictionary default to "random".

  • seed – base random seed. Seeds seed, seed+1, … are used for successive edge tags respectively.

Returns:

tuple of (NodeCollection, Edges).

hiveplotlib.datasets.toy_hive_plots.example_minimal_hive_plot() HivePlot#

Generate a minimal HivePlot instance with edge color styling.

Uses example_hive_plot() with 6 nodes and 6 edges (seed 42) and adds a uniform edge color of "#006BA4" to all axis pairs that have edges.

Returns:

HivePlot instance.

hiveplotlib.datasets.toy_hive_plots.example_multi_tag_hive_plot() HivePlot#

Generate a HivePlot instance with two edge tags and repeat_axes=True.

Creates 12 nodes (seed 10) with two independent sets of 15 edges (seeds 10 and 20) stored under tags "tag_red" and "tag_blue". Edge colors are "#FF0000" and "#0000FF" respectively.

Returns:

HivePlot instance with two edge tags.

hiveplotlib.datasets.toy_hive_plots.example_node_collection(num_nodes: int = 100, seed: int = 0, unique_id_column: str = 'unique_id') NodeCollection#

Generate example NodeCollection.

Each node will have a "low", "med", and "high" value, where these values are randomly generated, and as the names suggest, for the resulting values of each node, "low" < "med" < "high".

Unique ID column will be given the name "unique_id".

Parameters:
  • num_nodes – how many nodes to randomly generate. Node unique IDs will be the integers 0, 1, … , num_nodes - 1.

  • seed – random seed to use when randomly generating node data.

  • unique_id_column – name to assign to the column in the resulting NodeCollection.data attribute that corresponds to the unique IDs.

Returns:

NodeCollection of node data.

hiveplotlib.datasets.toy_hive_plots.example_node_data(num_nodes: int = 100, seed: int = 0) DataFrame#

Generate example node dataframe.

Each node will have a "low", "med", and "high" value, where these values are randomly generated, and as the names suggest, for the resulting values of each node, "low" < "med" < "high".

Unique ID column will be given the name "unique_id".

Parameters:
  • num_nodes – how many nodes to randomly generate. Node unique IDs will be the integers 0, 1, … , num_nodes - 1.

  • seed – random seed to use when randomly generating node data.

Returns:

dataframe of node data.

hiveplotlib.datasets.toy_hive_plots.example_nodes_and_edges(num_nodes: int = 100, num_edges: int = 200, num_axes: int = 3, seed: int = 0) Tuple[List[Node], List[List[Hashable]], ndarray]#

Generate example nodes, node splits (one list of nodes per intended axis), and edges.

Each node will have a "low", "med", and "high" value, where these values are randomly generated, and as the names suggest, for the resulting values of each node, "low" < "med" < "high".

Parameters:
  • num_nodes – how many nodes to randomly generate. Node unique IDs will be the integers 0, 1, … , num_nodes - 1.

  • num_edges – how many edges to randomly generate.

  • num_axes – how many axes into which to partition the randomly generated nodes.

  • seed – random seed to use when randomly generating node and edge data.

Returns:

list of generated Node instances, a list of num_axes lists that evenly split the node IDs to be allocated to their own axes, and a (num_edges, 2) shaped array of random edges between nodes.

hiveplotlib.datasets.toy_hive_plots.example_trade_nodes_and_edges() Tuple[NodeCollection, Edges]#

Load the international trade nodes and edges.

Used in the Hive Plots with More Than 3 Groups notebook.

Uses hiveplotlib.datasets.international_trade.international_trade_data() to load 2019 trade data for HS92 trade group 8112 (“beryllium, chromium, germanium, vanadium, gallium, hafnium, indium, niobium (columbium), rhenium and thallium, articles thereof, and waste or scrap”) from the Harvard Growth Lab.

Node columns (152 nodes):

  • "country": ISO 3166-1 alpha-3 country code (unique ID).

  • "continent": continent of the country (Africa, Asia, Europe, North America, Oceania, or South America).

  • "export_value": total USD exported by that country within trade group 8112 in 2019. Countries with no exports are assigned 0.

Edge columns (1158 edges):

Each edge represents non-zero trade from an origin country to a destination country.

  • "origin_country": ISO 3166-1 alpha-3 code of the exporting country (from column).

  • "destination_country": ISO 3166-1 alpha-3 code of the importing country (to column).

  • "export_value": USD exported from the origin to the destination.

  • "origin_continent": continent of the origin country.

  • "destination_continent": continent of the destination country.

Returns:

tuple of (NodeCollection, Edges) representing the trade network.