Toy Hive Plots#
Tools for generating toy hive plots.
- hiveplotlib.datasets.toy_hive_plots.example_base_hive_plot(num_nodes: int = 15, num_edges: int = 30, seed: int = 0, **hive_plot_n_axes_kwargs) BaseHivePlot#
Generate example hive plot with
"Low","Medium", and"High"axes (plus repeat axes).Nodes and edges will be generated and placed randomly.
- Parameters:
num_nodes – number of nodes to generate.
num_edges – number of edges to generate.
seed – random seed to use when generating nodes and edges.
hive_plot_n_axes_kwargs – additional keyword arguments for the underlying
hiveplotlib.hive_plot_n_axes()call.
- Returns:
resulting
BaseHivePlotinstance.
- hiveplotlib.datasets.toy_hive_plots.example_edge_data(nodes: NodeCollection, num_edges: int = 100, from_column_name: Hashable = 'from', to_column_name: Hashable = 'to', seed: int = 0) DataFrame#
Generate example edge data from a provided
NodeCollection.- Parameters:
nodes – nodes from which to generate example edges.
num_edges – how many example edges to randomly generate.
from_column_name – name to assign to the edge origin column, whose values correspond to node IDs where a given edge starts.
to_column_name – name to assign to the edge destination column, whose values correspond to node IDs where a given edge ends.
seed – random seed to use when randomly generating edge data.
- Returns:
random edge data as (n, 2) DataFrame of [from, to] edges.
- hiveplotlib.datasets.toy_hive_plots.example_edges(nodes: NodeCollection, num_edges: int = 100, from_column_name: Hashable = 'from', to_column_name: Hashable = 'to', seed: int = 0) Edges#
Generate example edges from a provided
NodeCollection.- Parameters:
nodes – nodes from which to generate example edges.
num_edges – how many example edges to randomly generate.
from_column_name – name to assign to the edge origin column, whose values correspond to node IDs where a given edge starts.
to_column_name – name to assign to the edge destination column, whose values correspond to node IDs where a given edge ends.
seed – random seed to use when randomly generating edge data.
- Returns:
random edge data.
- hiveplotlib.datasets.toy_hive_plots.example_full_kwargs_hive_plot() HivePlot#
Generate a
HivePlotinstance with data-dependent edge kwargs, colormaps, and node styling.Uses
example_hive_plot()with 9 nodes, 10 edges (seed 99), andrepeat_axes=True. Scaled edge data columns are created for alpha and linewidth, and all edge visualization kwargs are set viaupdate_edge_plotting_keyword_arguments()using column name references:color: the
"low"edge data column mapped through a"cividis"colormap.alpha: the
"alpha_scaled"column (low / 10, yielding values in ~0.3-1.0).linewidth: the
"lw_scaled"column (low / 3, yielding values in ~1.0-3.3).
Node styling is applied via
update_node_viz_kwargswith a"viridis"colormap mapped to the"low"data column.- Returns:
HivePlotinstance with node and edge kwargs.
- hiveplotlib.datasets.toy_hive_plots.example_hive_plot(num_nodes: int = 100, num_edges: int = 100, partition_data_column: Literal['low', 'med', 'high'] = 'low', labels: List[Hashable] | None = ['A', 'B', 'C'], cutoffs: List[int | float] | int = 3, partition_variable_name: Hashable | None = None, sorting_variables: Hashable | Dict[Hashable, Hashable] = 'low', seed: int = 0, node_unique_id_column: str = 'unique_id', **hive_plot_kwargs) HivePlot#
Generate example
HivePlotinstance.Each node will have a
"low","med", and"high"value, where these values are randomly generated, and as the names suggest, for the resulting values of each node,"low"<"med"<"high".Each edge will also have a
"low","med", and"high"value, with each value being the average “low” / “med” / “high” level of the two nodes composing the edge.Note
The generated
num_edgesedges will be randomly generated between all possible axes, including repeat axes. Thus, calling this function without requesting all repeat axes (i.e.repeat_axes=True) will result in less thannum_edgesedges visualized in the final hive plot. (All generated edges will be stored in the resultinghive_plot.edges, even though some will not be plotted if excluding repeat axes in the plot.)- Parameters:
num_nodes – how many nodes to randomly generate. Node unique IDs will be the integers 0, 1, … ,
num_nodes - 1.num_edges – how many example edges to randomly generate.
partition_data_column – which column of data in the underlying
dataattribute to use to partition the node data. Node data generated viahiveplotlib.datasets.toy_hive_plots.example_node_data().labels – labels assigned to each bin. Only referenced when
cutoffsis notNone.Nonelabels each bin as a string based on its range of values. Note, whencutoffsis a list,len(labels)must be 1 greater thanlen(cutoffs). Whencutoffsis anint,len(labels)must be equal tocutoffs.cutoffs – cutoffs to use in binning nodes according to data under
partition_data_column. DefaultNonewill bin nodes by unique values ofpartition_data_column. When provided as alist, the specified cutoffs will bin according to (-inf,cutoffs[0]], (cutoffs[0],cutoffs[1]], … , (cutoffs[-1], inf). When provided as anint, the exact numerical break points will be determined to createcutoffsequally-sized quantiles.partition_variable_name – name of the resulting partition variable to add to the
nodes.dataattribute of the resultingHivePlotinstance. DefaultNonewill name the partition column as"partition_0".sorting_variable – which node variable to use to sort / place the nodes on each axis. Providing a single value uses the same variable for each axis. Alternatively, providing a dictionary of keys as the unique values from
partition_variable_namecolumn data in thenodes.dataattribute and values being the corresponding sorting variable to use for that axis.seed – random seed to use when randomly generating node and edge data.
node_unique_id_column – name to assign to the column in the
nodes.dataattribute that corresponds to the unique IDs.hive_plot_kwargs – additional keyword arguments when creating the returned
hiveplotlib.HivePlot()instance.
- Returns:
randomly-generated
HivePlotinstance.
- hiveplotlib.datasets.toy_hive_plots.example_hpm_nodes_and_edges(num_groups: int = 3, nodes_per_group: int = 10, edge_tag_counts: Dict[str, int] | None = None, edge_structure: Dict[str, str] | None = None, seed: int = 42) Tuple[NodeCollection, Edges]#
Generate a shared toy dataset that works well with
HivePlotMatrixexamples.Returns a
NodeCollectionofnum_groups * nodes_per_groupnodes acrossnum_groupsgroups (A, B, C, …) with structured numeric columns, and a multi-tagEdgesobject.Node columns:
"unique_id": integers 0 to (num_groups * nodes_per_group - 1)."group": group label, assigned in consecutive blocks. Withnum_groups=3, nodes 0-9 are"A", 10-19 are"B", 20-29 are"C"."value1": correlated with group - each group draws from a distinct sub-interval of [0, 10] in ascending order. Sorting byvalue1places groups at distinct, separated positions along each axis if setting axesvminandvmaxvalues to the same fixed range."value2": uncorrelated noise - all nodes draw from Uniform(0, 10). Sorting byvalue2produces little visible group separation."value3": inversely correlated with group - the mirror image ofvalue1. The first group draws from the highest sub-interval, the last from the lowest.
Default edge tags (decreasing density):
"official": 40 edges."social": 30 edges."informal": 20 edges.
- Parameters:
num_groups – number of groups to generate (A, B, C, …). Default 3.
nodes_per_group – number of nodes in each group. Default 10.
edge_tag_counts – dictionary mapping edge tag names to edge counts. Defaults to
{"official": 40, "social": 30, "informal": 20}.edge_structure –
dictionary mapping edge tag names to structure types. Supported types:
"random": edges connect any two nodes uniformly at random (default)."intragroup": edges connect nodes within the same group only."intergroup": edges connect nodes in different groups only.
Tags not listed in this dictionary default to
"random".seed – base random seed. Seeds
seed,seed+1, … are used for successive edge tags respectively.
- Returns:
tuple of
(NodeCollection, Edges).
- hiveplotlib.datasets.toy_hive_plots.example_minimal_hive_plot() HivePlot#
Generate a minimal
HivePlotinstance with edge color styling.Uses
example_hive_plot()with 6 nodes and 6 edges (seed 42) and adds a uniform edge color of"#006BA4"to all axis pairs that have edges.- Returns:
HivePlotinstance.
- hiveplotlib.datasets.toy_hive_plots.example_multi_tag_hive_plot() HivePlot#
Generate a
HivePlotinstance with two edge tags andrepeat_axes=True.Creates 12 nodes (seed 10) with two independent sets of 15 edges (seeds 10 and 20) stored under tags
"tag_red"and"tag_blue". Edge colors are"#FF0000"and"#0000FF"respectively.- Returns:
HivePlotinstance with two edge tags.
- hiveplotlib.datasets.toy_hive_plots.example_node_collection(num_nodes: int = 100, seed: int = 0, unique_id_column: str = 'unique_id') NodeCollection#
Generate example
NodeCollection.Each node will have a
"low","med", and"high"value, where these values are randomly generated, and as the names suggest, for the resulting values of each node,"low"<"med"<"high".Unique ID column will be given the name
"unique_id".- Parameters:
num_nodes – how many nodes to randomly generate. Node unique IDs will be the integers 0, 1, … ,
num_nodes - 1.seed – random seed to use when randomly generating node data.
unique_id_column – name to assign to the column in the resulting
NodeCollection.dataattribute that corresponds to the unique IDs.
- Returns:
NodeCollectionof node data.
- hiveplotlib.datasets.toy_hive_plots.example_node_data(num_nodes: int = 100, seed: int = 0) DataFrame#
Generate example node dataframe.
Each node will have a
"low","med", and"high"value, where these values are randomly generated, and as the names suggest, for the resulting values of each node,"low"<"med"<"high".Unique ID column will be given the name
"unique_id".- Parameters:
num_nodes – how many nodes to randomly generate. Node unique IDs will be the integers 0, 1, … ,
num_nodes - 1.seed – random seed to use when randomly generating node data.
- Returns:
dataframe of node data.
- hiveplotlib.datasets.toy_hive_plots.example_nodes_and_edges(num_nodes: int = 100, num_edges: int = 200, num_axes: int = 3, seed: int = 0) Tuple[List[Node], List[List[Hashable]], ndarray]#
Generate example nodes, node splits (one list of nodes per intended axis), and edges.
Each node will have a
"low","med", and"high"value, where these values are randomly generated, and as the names suggest, for the resulting values of each node,"low"<"med"<"high".- Parameters:
num_nodes – how many nodes to randomly generate. Node unique IDs will be the integers 0, 1, … ,
num_nodes - 1.num_edges – how many edges to randomly generate.
num_axes – how many axes into which to partition the randomly generated nodes.
seed – random seed to use when randomly generating node and edge data.
- Returns:
list of generated
Nodeinstances, a list ofnum_axeslists that evenly split the node IDs to be allocated to their own axes, and a(num_edges, 2)shaped array of random edges between nodes.
- hiveplotlib.datasets.toy_hive_plots.example_trade_nodes_and_edges() Tuple[NodeCollection, Edges]#
Load the international trade nodes and edges.
Used in the Hive Plots with More Than 3 Groups notebook.
Uses
hiveplotlib.datasets.international_trade.international_trade_data()to load 2019 trade data for HS92 trade group 8112 (“beryllium, chromium, germanium, vanadium, gallium, hafnium, indium, niobium (columbium), rhenium and thallium, articles thereof, and waste or scrap”) from the Harvard Growth Lab.Node columns (152 nodes):
"country": ISO 3166-1 alpha-3 country code (unique ID)."continent": continent of the country (Africa, Asia, Europe, North America, Oceania, or South America)."export_value": total USD exported by that country within trade group 8112 in 2019. Countries with no exports are assigned 0.
Edge columns (1158 edges):
Each edge represents non-zero trade from an origin country to a destination country.
"origin_country": ISO 3166-1 alpha-3 code of the exporting country (from column)."destination_country": ISO 3166-1 alpha-3 code of the importing country (to column)."export_value": USD exported from the origin to the destination."origin_continent": continent of the origin country."destination_continent": continent of the destination country.
- Returns:
tuple of
(NodeCollection, Edges)representing the trade network.