Hive Plots with More Than 3 Groups

Throughout the hiveplotlib documentation, we’ve routinely partitioned networks into 3 groups. This choice was intentional because a 3-axis hive plot shows the edges between every pair of axes without any overlap.

What about when our data naturally partitions into more than 3 groups though? This notebook explores some options using an international trade dataset from the Harvard Growth Lab to visualize the trade relationships between continents.

Note: the network explored in this notebook is highly interconnected, so to see more of the nuance in our visualizations, we will opt to construct our hive plots using the datashader backend, which can be installed by running:

pip install hiveplotlib[datashader].

For more on constructing hive plots with datashader, see the Hive Plots for Large Networks notebook.

[1]:
from itertools import combinations
from pprint import pprint

import cartopy.crs as ccrs
import geopandas as gpd
import matplotlib.pyplot as plt
import pandas as pd
from hiveplotlib import Axis, hive_plot_n_axes
from hiveplotlib.datasets import international_trade_data
from hiveplotlib.node import dataframe_to_node_list, split_nodes_on_variable
from hiveplotlib.viz.datashader import datashade_hive_plot_mpl

Loading International Trade Data

The Growth Lab has more than 25 years of trade data with more than 1200 4-digit trade groups.

hiveplotlib installs with a subset of the Growth Lab’s international trade data. Here, we will look at the subnetwork of trade data from 2019 under trade group 8112.

Trade groups are specified under the Harmonized System (HS) 1992 classification. According to dataweb.usitc.gov, group 8112 represents “beryllium, chromium, germanium, vanadium, gallium, hafnium, indium, niobium (columbium), rhenium and thallium, articles thereof, and waste or scrap.”

[2]:
data, metadata = international_trade_data(year=2019, hs92_code=8112)
[3]:
data
[3]:
origin_country destination_country export_value origin_continent destination_continent
0 AGO NAM 1527.0 Africa Africa
1 AGO SAU 244.0 Africa Asia
2 ARE BEL 1668.0 Asia Europe
3 ARE DEU 2819.0 Asia Europe
4 ARE GBR 51608.0 Asia Europe
... ... ... ... ... ...
1153 ZAF SGP 46868.0 Africa Asia
1154 ZAF THA 113396.0 Africa Asia
1155 ZAF TWN 131260.0 Africa Asia
1156 ZAF USA 23925.0 Africa North America
1157 ZAF ZWE 16720.0 Africa Africa

1158 rows × 5 columns

The hiveplotlib.datasets.international_trade_data() function also returns relevant metadata with information about the returned pandas.DataFrame as well as data provenance details.

[4]:
pprint(metadata)
{'citation': "The Growth Lab at Harvard University, 2019, 'International Trade "
             "Data (HS, 92)', https://doi.org/10.7910/DVN/T4CHWJ, Harvard "
             'Dataverse, V5, UNF:6:fBTbvO79jN4d+3lfNSzRtw== [fileUNF]',
 'created_at': '2023-03-20 13:17:47.134697',
 'data_columns': {'destination_continent': 'Continent corresponding to each '
                                           "row / record's destination "
                                           'country. Based off of the '
                                           "`geopandas` 'naturalearth_lowres' "
                                           'dataset, with some manual '
                                           'revisions (for more on these '
                                           'revisions, see the '
                                           "'revisions_from_geopandas_isoa3' "
                                           "and 'dropped_data' keys in this "
                                           'metadata dictionary).',
                  'destination_country': 'For a row / record of data, this '
                                         'represents the country that is '
                                         'importing products from the '
                                         '`origin_country`. Countries are '
                                         'specified  with ISO 3166-1 alpha-3 '
                                         'codes. For more on which country '
                                         'corresponds to which code, see '
                                         'https://en.wikipedia.org/wiki/ISO_3166-1_alpha-3.',
                  'export_value': 'For a row / record of data, this represents '
                                  'the *amount* (in USD) that is exported from '
                                  'the `origin_country` to the '
                                  '`destination_country`.',
                  'origin_continent': 'Continent corresponding to each row / '
                                      "record's origin country. Based off of "
                                      "the `geopandas` 'naturalearth_lowres' "
                                      'dataset, with some manual revisions '
                                      '(for more on these revisions, see the '
                                      "'revisions_from_geopandas_isoa3' and "
                                      "'dropped_data' keys in this metadata "
                                      'dictionary).',
                  'origin_country': 'For a row / record of data, this '
                                    'represents the country that is exporting '
                                    'products to the `destination_country`. '
                                    'Countries are specified with ISO 3166-1 '
                                    'alpha-3 codes. For more on which country '
                                    'corresponds to which code, see '
                                    'https://en.wikipedia.org/wiki/ISO_3166-1_alpha-3.'},
 'data_processing': 'Data was first downloaded from '
                    'https://doi.org/10.7910/DVN/T4CHWJ and processed by the '
                    'runner '
                    'https://gitlab.com/geomdata/hiveplotlib/-/blob/master/runners/make_trade_network_dataset.py.',
 'dropped_data': {'codes_dropped': ['ANS'],
                  'more_info': 'Could not determine the following continents '
                               'from ISO 3166-1 alpha-3 codes: ANS. All '
                               'records including these codes will be '
                               'dropped.'},
 'raw_data_file_name': 'country_partner_hsproduct4digit_year_2019.dta',
 'revisions_from_geopandas_isoa3': [{'ABW': {'continent': 'South America',
                                             'country': 'Aruba'},
                                     'ATG': {'continent': 'North America',
                                             'country': 'Antigua and Barbuda'},
                                     'BES': {'continent': 'South America',
                                             'country': 'Bonaire, Sint '
                                                        'Eustatius and Saba'},
                                     'BHR': {'continent': 'Asia',
                                             'country': 'Bahrain'},
                                     'BMU': {'continent': 'Europe',
                                             'country': 'Bermuda'},
                                     'BRB': {'continent': 'North America',
                                             'country': 'Barbados'},
                                     'CPV': {'continent': 'Africa',
                                             'country': 'Cabo Verde'},
                                     'CUW': {'continent': 'South America',
                                             'country': 'Curaçao'},
                                     'CYM': {'continent': 'North America',
                                             'country': 'Cayman Islands'},
                                     'DMA': {'continent': 'North America',
                                             'country': 'Dominica'},
                                     'GRD': {'continent': 'North America',
                                             'country': 'Grenada'},
                                     'HKG': {'continent': 'Asia',
                                             'country': 'Hong Kong'},
                                     'KNA': {'continent': 'North America',
                                             'country': 'Saint Kitts and '
                                                        'Nevis'},
                                     'LCA': {'continent': 'North America',
                                             'country': 'Saint Lucia'},
                                     'MAC': {'continent': 'Asia',
                                             'country': 'Macau'},
                                     'MDV': {'continent': 'Asia',
                                             'country': 'Maldives'},
                                     'MHL': {'continent': 'Oceania',
                                             'country': 'Marshall Islands'},
                                     'MLT': {'continent': 'Europe',
                                             'country': 'Malta'},
                                     'PYF': {'continent': 'Oceania',
                                             'country': 'French Polynesia'},
                                     'SGP': {'continent': 'Asia',
                                             'country': 'Singapore'},
                                     'SXM': {'continent': 'North America',
                                             'country': 'Sint Maarten'},
                                     'TCA': {'continent': 'North America',
                                             'country': 'Turks and Caicos '
                                                        'Islands'},
                                     'TON': {'continent': 'Oceania',
                                             'country': 'Tonga'},
                                     'VCT': {'continent': 'North America',
                                             'country': 'Saint Vincent and the '
                                                        'Grenadines'},
                                     'VGB': {'continent': 'North America',
                                             'country': 'British Virgin '
                                                        'Islands'}}],
 'trade_data_year': 2019,
 'trade_id': '8112',
 'trade_id_information': 'Using the Harmonized System (HS) 1992 classification '
                         'for trade groups. Our data is the 4-digit level of '
                         'classification. For more, see '
                         'https://dataweb.usitc.gov/classification/commodity-description/HTS/4.'}

Geography-Based Network Visualization

The other networks we have discussed throughout the hiveplotlib documentation were abstract networks, that is, networks with no inherent location to place nodes, which meant there was no intuition sacrificed by going from an abstract layout (e.g. circular, force-directed, etc.) to a hive plot.

In this geographic example, there is a sacrifice. If we said “the United States and Japan are trade partners,” then we can visualize that edge on a map. That edge is more interpretable than a hive plot edge, so before we move to hive plots, let’s first visualize this network with a geographic layout. We will do this by drawing geodesic edges between country centroids, calculated and visualized using geopandas and cartopy.

[5]:
# pull in country data from the hiveplotlib repository
countries = gpd.read_file(
    "https://gitlab.com/geomdata/hiveplotlib/-/raw/master/data/countries.shp.zip"
)

# calculate centroids in different CRS to get more representative centroids
df_epsg = countries.to_crs(ccrs.AlbersEqualArea().proj4_init)
centroids = df_epsg.centroid
# store centroids in standard lat, lon CRS
countries["centroids"] = centroids.to_crs(epsg=4326)

# left join centroid locations for destination_country into trade data
gpd_country_columns = ["iso_a3", "centroids"]
destination_country_centroid_df = data.merge(
    countries.loc[:, gpd_country_columns],
    how="left",
    left_on="destination_country",
    right_on="iso_a3",
)
destination_country_centroid_df["destination_centroid_lon"] = gpd.GeoSeries(
    destination_country_centroid_df.centroids
).x

destination_country_centroid_df["destination_centroid_lat"] = gpd.GeoSeries(
    destination_country_centroid_df.centroids
).y

destination_country_centroid_df = destination_country_centroid_df.drop(
    columns=gpd_country_columns
)

# left join centroid locations for origin_country into trade data
with_centroids_df = destination_country_centroid_df.merge(
    countries.loc[:, gpd_country_columns],
    how="left",
    left_on="origin_country",
    right_on="iso_a3",
)

with_centroids_df["origin_centroid_lon"] = gpd.GeoSeries(with_centroids_df.centroids).x
with_centroids_df["origin_centroid_lat"] = gpd.GeoSeries(with_centroids_df.centroids).y

with_centroids_df = with_centroids_df.drop(columns=["centroids", "iso_a3"]).dropna()

with_centroids_df.head()
[5]:
origin_country destination_country export_value origin_continent destination_continent destination_centroid_lon destination_centroid_lat origin_centroid_lon origin_centroid_lat
0 AGO NAM 1527.0 Africa Africa 17.154347 -21.907150 17.473268 -12.086256
1 AGO SAU 244.0 Africa Asia 44.662131 24.135874 17.473268 -12.086256
2 ARE BEL 1668.0 Asia Europe 4.586001 50.651138 54.200066 23.875338
3 ARE DEU 2819.0 Asia Europe 10.273224 51.061965 54.200066 23.875338
4 ARE GBR 51608.0 Asia Europe -2.776213 53.784747 54.200066 23.875338
[6]:
# wrangle lons and lats together in structure to plot geodesic lines with cartopy
lons = with_centroids_df.loc[:, ["origin_centroid_lon", "destination_centroid_lon"]]
lats = with_centroids_df.loc[:, ["origin_centroid_lat", "destination_centroid_lat"]]

lon_list = lons.to_numpy().tolist()
lat_list = lats.to_numpy().tolist()
[7]:
fig, ax = plt.subplots(
    figsize=(10, 10), subplot_kw={"projection": ccrs.PlateCarree()}, dpi=300
)
countries.plot(facecolor="gray", alpha=0.5, edgecolor="black", linewidth=0.3, ax=ax)

# cartopy
for i in range(len(lon_list)):
    ax.plot(lon_list[i], lat_list[i], transform=ccrs.Geodetic(), color="C0", lw=0.06)
gl = ax.gridlines(draw_labels=True, linewidth=0.5, color="black", alpha=0.2)
gl.top_labels = False
ax.set_title(
    f"Global Trade Network ({metadata['trade_data_year']})\n"
    f"Trade Group: {metadata['trade_id']}",
    loc="left",
)
plt.show()
_images/hive_plots_more_than_three_groups_11_0.png

Despite the intution behind a single edge, the resulting graph of all of the edges in the network still ends up being a hairball, so let’s instead consider hive plot alternatives.

Partitioning the Network

To turn this network into a hive plot, we must first ask how we should partition the nodes into separate axes. A natural separation for international trade is to separate countries by continent.

Our continental classification is based on the geopandas separation of countries, which we visualize below.

[8]:
fig, ax = plt.subplots(figsize=(7, 7), subplot_kw={"projection": ccrs.PlateCarree()})
countries.plot(
    column="continent",
    ax=ax,
    legend=True,
    legend_kwds={"loc": "upper left", "bbox_to_anchor": (1.1, 1)},
)
gl = ax.gridlines(draw_labels=True, linewidth=0.5, color="black", alpha=0.2)
gl.top_labels = False
ax.set_title("Geopandas Assignment of Countries to Continents", loc="left")
plt.show()
_images/hive_plots_more_than_three_groups_14_0.png

As far as international trade is concerned, we’ll want to focus on Africa, Asia, Europe, North America, Oceania, and South America, a partition of 6 groups. The question then becomes, how could we make a hive plot to show intercontinental trade for all 6 groups?

Below, we convert our data into the node and edge data structures needed to make hive plots.

[9]:
export_totals_per_country = (
    data.loc[:, ["origin_country", "export_value"]]
    .groupby("origin_country")
    .sum()
    .reset_index()
)

node_data = pd.concat(
    [
        data.loc[:, ["origin_country", "origin_continent"]].rename(
            columns={"origin_country": "country", "origin_continent": "continent"}
        ),
        data.loc[:, ["destination_country", "destination_continent"]].rename(
            columns={
                "destination_country": "country",
                "destination_continent": "continent",
            }
        ),
    ]
).drop_duplicates()

node_data = node_data.merge(
    export_totals_per_country, how="left", left_on="country", right_on="origin_country"
).drop(columns=["origin_country"])

# fill in countries with no exports as 0, not NaN
node_data.export_value = node_data.export_value.fillna(0)

node_list = dataframe_to_node_list(df=node_data, unique_id_column="country")

node_splits = split_nodes_on_variable(node_list, variable_name="continent")

edges = data.loc[:, ["origin_country", "destination_country"]].to_numpy()

Hive Plots of Intercontinental Trade

We’re now ready to generate some hive plots, but with 6 groups to show instead of the usual 3, we have several visualization options worth considering.

Option 1: More Axes Radiating Around Origin

The simplest thing we could do is simply fit more axes along the 360 degree range. With a partition of 6 groups and repeat axes to also show intracontinental trade, that means we need to fit in a total of 12 axes.

[10]:
continents_to_plot = [
    "Africa",
    "Asia",
    "Europe",
    "North America",
    "Oceania",
    "South America",
]

# vmin and vmax to use for each axis (for consistency in interpretation between axes)
vmin = 0
vmax = 3e8

# sorting variable for each axis
sorting_variable = "export_value"

hp = hive_plot_n_axes(
    node_list=node_list,
    edges=edges,
    axes_assignments=[node_splits[i] for i in continents_to_plot],
    sorting_variables=[sorting_variable] * len(continents_to_plot),
    axes_names=continents_to_plot,
    vmins=[vmin] * len(continents_to_plot),
    vmaxes=[vmax] * len(continents_to_plot),
    repeat_axes=[True] * len(continents_to_plot),
    angle_between_repeat_axes=20,
)

fig, ax, im_nodes, im_edges = datashade_hive_plot_mpl(
    hp,
    axes_labels_buffer=1.01,
    axes_labels_fontsize=10,
    text_kwargs={"color": "maroon"},
    pixel_spread_nodes=10,
)

# colorbars for datashading nodes and edges
cax_edges = ax.inset_axes([0.95, 0.85, 0.2, 0.01], transform=ax.transAxes)
cb_edges = fig.colorbar(im_edges, ax=ax, cax=cax_edges, orientation="horizontal")
cb_edges.ax.set_title("Edge Density")

cax_nodes = ax.inset_axes([0.95, 0.75, 0.2, 0.01], transform=ax.transAxes)
cb_nodes = fig.colorbar(im_nodes, ax=ax, cax=cax_nodes, orientation="horizontal")
cb_nodes.ax.set_title("Node Density")

ax.set_title(
    "Problem: 6 pairs of axes around the origin leaves out many edges",
    size=18,
    y=1.1,
    loc="left",
)

ax.text(
    x=0.0,
    y=-0.2,
    s=f"{metadata['trade_data_year']} inter-country trade data for "
    f"trade group {metadata['trade_id']}.\n"
    "Trade groups specified according to the "
    "Harmonized System (HS) 1992 classification.\n"
    "An edge corresponds to non-zero trade between two countries.\n"
    f"Nodes placed on each axis sorted by {sorting_variable.replace('_', ' ')} "
    f"(in USD) for trade group {metadata['trade_id']}.\n"
    rf"Each axis spans \$0 - \${int(vmax):,}.\n\n"
    "Data from the Growth Lab at Harvard University.",
    size=9,
    color="gray",
    ha="left",
    transform=ax.transAxes,
)
plt.show()
_images/hive_plots_more_than_three_groups_18_0.png

The Problem with a 6-Axis Hive Plot

What do we lose with our 6-axis hive plot though? Although all of our nodes are represented here, we have sacrificed a substantial number of our edges.

We may be showing all intracontinental edges, but we have abandoned multiple groups of intercontinental edges, as hive plots only draw edges between adjacent axes. That means, for example, we can see the trade between Asia and Europe, Asia and Africa, and Asia with the rest of Asia, but we cannot see Asia’s trade with North America, Oceania, or South America in the above figure.

To put this in combinatorics terms, we want to see edges between all possible pairs of 2 continents among our 6 continents from which to choose, or 6 choose 2 pairs (order does not matter), which comes out to 15 intercontinental pairs. In this hive plot, however, we can only see 6 intercontinental pairings.

[11]:
combs = list(combinations(continents_to_plot, 2))
print(f"Total Combinations: {len(combs)}")
combs
Total Combinations: 15
[11]:
[('Africa', 'Asia'),
 ('Africa', 'Europe'),
 ('Africa', 'North America'),
 ('Africa', 'Oceania'),
 ('Africa', 'South America'),
 ('Asia', 'Europe'),
 ('Asia', 'North America'),
 ('Asia', 'Oceania'),
 ('Asia', 'South America'),
 ('Europe', 'North America'),
 ('Europe', 'Oceania'),
 ('Europe', 'South America'),
 ('North America', 'Oceania'),
 ('North America', 'South America'),
 ('Oceania', 'South America')]

This is where the beauty lies in 3-axis hive plots. With 3 groups to compare, there are only 3 pairs to see, allowing us to make a single 3-axis hive plot without sacrificing any intercontinental edges.

[12]:
three_continents_only = continents_to_plot[:3]
three_continents_combs = list(combinations(three_continents_only, 2))
print(f"If we instead only look at 3 continents: {three_continents_only}")
print(f"Total Combinations: {len(three_continents_combs)}")
three_continents_combs
If we instead only look at 3 continents: ['Africa', 'Asia', 'Europe']
Total Combinations: 3
[12]:
[('Africa', 'Asia'), ('Africa', 'Europe'), ('Asia', 'Europe')]

Option 2: Two Layers of Axes

When we only added axes along the degree dimension, we hid many intercontinental edges, but if we instead place some of the axes further away from the center of the hive plot, we can still show full connectivity in a relatively compact figure.

By placing our 6 axes in two layers of 3 radiating around the origin, we can show all intercontinental edges in a single hive plot.

As this is not the standard expectation for how one would generate a hive plot, multi-layered hive plots are not currently supported by the high-level API in hiveplotlib. That being said, we can still exploit the low-level API to orient, place, and connect our outer layer of axes.

[13]:
# two layers of axes radiating *out* from the center of the plot
inner_continents_to_plot = ["Asia", "Europe", "North America"]
outer_continents_to_plot = ["Africa", "Oceania", "South America"]

# be mindful of how many inner and outer continents there are
#  (for low-level plotting later)
num_inner_continents = len(inner_continents_to_plot)
num_outer_continents = len(outer_continents_to_plot)

# vmin and vmax to use for each axis (for consistency in interpretation between axes)
vmin = 0
vmax = 3e8

# sorting variable for each axis
sorting_variable = "export_value"

# build initial 3 axis hive plot with high-level hiveplotlib API
hp = hive_plot_n_axes(
    node_list=node_list,
    edges=edges,
    axes_assignments=[node_splits[i] for i in inner_continents_to_plot],
    sorting_variables=[sorting_variable] * len(inner_continents_to_plot),
    axes_names=inner_continents_to_plot,
    vmins=[vmin] * len(inner_continents_to_plot),
    vmaxes=[vmax] * len(inner_continents_to_plot),
    repeat_axes=[True] * len(inner_continents_to_plot),
)


# place outer layer of axes and repeat axes on top of original layer of axes
#  via the low-level hiveplotlib API
for i, (inner_continent, outer_continent) in enumerate(
    zip(inner_continents_to_plot, outer_continents_to_plot)
):
    outer_axis = Axis(
        axis_id=outer_continent,
        start=hp.axes[inner_continent].polar_start + 6,
        end=hp.axes[inner_continent].polar_end + 6,
        angle=hp.axes[inner_continent].angle,
    )

    outer_axis_repeat = Axis(
        axis_id=f"{outer_continent}_repeat",
        start=hp.axes[f"{inner_continent}_repeat"].polar_start + 6,
        end=hp.axes[f"{inner_continent}_repeat"].polar_end + 6,
        angle=hp.axes[f"{inner_continent}_repeat"].angle,
        long_name=outer_continent,
    )

    hp.add_axes([outer_axis, outer_axis_repeat])

    hp.place_nodes_on_axis(
        outer_continent,
        node_splits[outer_continent],
        sorting_feature_to_use=sorting_variable,
        vmin=vmin,
        vmax=vmax,
    )
    hp.place_nodes_on_axis(
        f"{outer_continent}_repeat",
        node_splits[outer_continent],
        sorting_feature_to_use=sorting_variable,
        vmin=vmin,
        vmax=vmax,
    )

    # connect the "inner" and "outer" axes on top of each other
    # only need to do *one* of these
    #  (e.g. inner to outer_repeat OR inner_repeat to outer)
    #  otherwise it would be a disingenuous repetition of edges
    hp.connect_axes(
        edges=edges, axis_id_1=inner_continent, axis_id_2=f"{outer_continent}_repeat"
    )

    # connect the outer axis to its repeat for *intra*continental trade
    # only do one direction, otherwise it would be a disingenuous repetition of edges
    hp.connect_axes(
        edges=edges,
        axis_id_1=outer_continent,
        axis_id_2=f"{outer_continent}_repeat",
        a2_to_a1=False,
    )

    # connecting outer axis to the other inner axes (not the one below it)
    # by counterclockwise construction of `hive_plot_n_axes()`,
    #  we know the repeat axis connects to the *next* original inner axis
    #  and the original axis connects to the *previous* inner axis repeat
    hp.connect_axes(
        edges=edges,
        axis_id_1=f"{outer_continent}_repeat",
        axis_id_2=inner_continents_to_plot[(i + 1) % num_inner_continents],
    )
    hp.connect_axes(
        edges=edges,
        axis_id_1=outer_continent,
        axis_id_2=f"{inner_continents_to_plot[(i - 1) % num_inner_continents]}_repeat",
    )

# connect the outer axes to each other
for i, outer_continent in enumerate(outer_continents_to_plot):
    hp.connect_axes(
        edges=edges,
        axis_id_1=f"{outer_continent}_repeat",
        axis_id_2=outer_continents_to_plot[(i + 1) % num_outer_continents],
    )


fig, ax, im_nodes, im_edges = datashade_hive_plot_mpl(
    hp,
    axes_labels_buffer=1.01,
    axes_labels_fontsize=10,
    text_kwargs={"color": "maroon"},
    pixel_spread_nodes=10,
)

# colorbars for datashading nodes and edges
cax_edges = ax.inset_axes([0.95, 0.85, 0.2, 0.01], transform=ax.transAxes)
cb_edges = fig.colorbar(im_edges, ax=ax, cax=cax_edges, orientation="horizontal")
cb_edges.ax.set_title("Edge Density")

cax_nodes = ax.inset_axes([0.95, 0.75, 0.2, 0.01], transform=ax.transAxes)
cb_nodes = fig.colorbar(im_nodes, ax=ax, cax=cax_nodes, orientation="horizontal")
cb_nodes.ax.set_title("Node Density")

ax.text(
    x=0.0,
    y=-0.2,
    s=f"{metadata['trade_data_year']} inter-country trade data for "
    f"trade group {metadata['trade_id']}.\n"
    "Trade groups specified according to the "
    "Harmonized System (HS) 1992 classification.\n"
    "An edge corresponds to non-zero trade between two countries.\n"
    f"Nodes placed on each axis sorted by {sorting_variable.replace('_', ' ')} "
    f"(in USD) for trade group {metadata['trade_id']}.\n"
    rf"Each axis spans \$0 - \${int(vmax):,}.\n\n"
    "Data from the Growth Lab at Harvard University.",
    size=9,
    color="gray",
    ha="left",
    transform=ax.transAxes,
)
plt.show()
_images/hive_plots_more_than_three_groups_25_0.png

Despite drawing the same number of axes here as in Option 1, we’re able to draw all the intercontinental edges.

We can quickly point out multiple patterns, such as:

  • Africa and Oceania export relatively minimal dollar amounts of goods within this trade group.

  • South America mostly trades with North America, Europe, and Asia, barely trades with Africa, and doesn’t trade with Oceania (or itself apparently) within this trade group.

  • Asia, Europe, and North America trade extenstively with everyone within this trade group.

That being said, the complexity of this double-layered hive plot is definitely higher than the usual single-layered hive plot, so extending to multilayered hive plots should only be done if absolutely necessary.

Note, this layering methodology can in theory scale to an arbitrary number of groups by adding additional layers outward from the origin, but depending on edge density, the ever-increasing complexity will likely be asking too much of the viewer.

Option 3: Collapse Some of the Groups to a Single Axis

Option 1 is particularly disingenuous, as it implies at first glance that we are looking at the full network, when in fact we’re missing roughly half of the intercontinental connections.

Option 2 is accurate, but on the brink of excessive complexity, which raises the question, could we make this figure simpler without trading off too much information content?

One answer to this question is to contemplate whether some groups could be collapsed together.

Africa and Oceania have multiple trade partners within this trade group, but only in small dollar amounts, making them good candidates for collapsing together.

Also, although well-connected, North America and South America have fewer connections than Europe or Asia. Plus, with the Americas adjacent to each other geographically, they lend themselves well for collapsing together.

Given all this, let’s try collapsing Africa, Oceania, North America, and South America together onto a single axis.

Once we have allocated our first two axes to Asia and Europe, collapsing the remaining groups of nodes to our third axis is easy in hiveplotlib. We need only specify our third axis as None when calling hive_plot_n_axes().

[14]:
# `None` works as "everything else" in `hive_plot_n_axes()`
continents_to_plot = ["Asia", "Europe", None]

# vmin and vmax to use for each axis (for consistency in interpretation between axes)
vmin = 0
vmax = 3e8

# sorting variable for each axis
sorting_variable = "export_value"

hp = hive_plot_n_axes(
    node_list=node_list,
    edges=edges,
    axes_assignments=[
        node_splits[i] if i is not None else None for i in continents_to_plot
    ],
    sorting_variables=[sorting_variable] * len(continents_to_plot),
    axes_names=[
        i if i is not None else "All Other\nContinents" for i in continents_to_plot
    ],
    vmins=[vmin] * len(continents_to_plot),
    vmaxes=[vmax] * len(continents_to_plot),
    repeat_axes=[True] * len(continents_to_plot),
)

fig, ax, im_nodes, im_edges = datashade_hive_plot_mpl(
    hp,
    axes_labels_buffer=1.01,
    axes_labels_fontsize=10,
    text_kwargs={"color": "maroon"},
    pixel_spread_nodes=10,
)

# colorbars for datashading nodes and edges
cax_edges = ax.inset_axes([0.95, 0.85, 0.2, 0.01], transform=ax.transAxes)
cb_edges = fig.colorbar(im_edges, ax=ax, cax=cax_edges, orientation="horizontal")
cb_edges.ax.set_title("Edge Density")

cax_nodes = ax.inset_axes([0.95, 0.75, 0.2, 0.01], transform=ax.transAxes)
cb_nodes = fig.colorbar(im_nodes, ax=ax, cax=cax_nodes, orientation="horizontal")
cb_nodes.ax.set_title("Node Density")

ax.text(
    x=0.0,
    y=-0.2,
    s=f"{metadata['trade_data_year']} inter-country trade data for "
    f"trade group {metadata['trade_id']}.\n"
    "Trade groups specified according to the "
    "Harmonized System (HS) 1992 classification.\n"
    "An edge corresponds to non-zero trade between two countries.\n"
    "'All Other Continents' includes countries from "
    "Africa, Oceania, and North and South America.\n"
    f"Nodes placed on each axis sorted by {sorting_variable.replace('_', ' ')} "
    f"(in USD) for trade group {metadata['trade_id']}.\n"
    rf"Each axis spans \$0 - \${int(vmax):,}.\n\n"
    "Data from the Growth Lab at Harvard University.",
    size=9,
    color="gray",
    ha="left",
    transform=ax.transAxes,
)
plt.show()
_images/hive_plots_more_than_three_groups_28_0.png

Collapsing down to 3 groups has made the resulting figure more edge-balanced and interpretable. More importantly, we have done this without sacrificing any intercontinental edges.

We have not made this simplification without cost, though. We do not know from this figure, for example, that South America does not trade with itself, or which nodes and edges correspond to North America / South America / Africa / Oceania.

Most notably, we had to see Option 2 to choose which groups to collapse here, which makes it dangerous to start with this single visualization in an exploratory analysis. That being said, starting here is still much better than starting at Option 1.

One could also mitigate information loss when collapsing groups by following a rule, for example, collapsing the groups with the lowest total exports or the fewest number of edges.

Option 4: Small Multiples with a Hive Plot Matrix

Option 3 shows us the relative simplicity of a hive plot with 3 groups, but only if we sacrifice continent-specific trade details by collapsing a subset of continents onto a single axis.

However, we do not need to look at only one hive plot. Instead, thanks to the compact structure of hive plots, we can look at a series of hive plots in small multiples.

Below, we create a Hive Plot Matrix (HPM), designed in the spirit of a Scatter Plot Matrix.

With our continued use of a None-collapsed axis, each hive plot will only have two uniquely-defined continent axes. This allows us to position multiple hive plots in a matrix where the \((i, j)\)th entry shows continent \(i\), continent \(j\), and a None-collapsed axis.

For the diagonal entries, we can plot just the two axes of intracontinental trade. This also allows us to drop the repeat axes from the non-diagonal hive plots, further simplifying each figure in our small multiples visualization.

[15]:
continents_to_plot = [
    "Africa",
    "Asia",
    "Europe",
    "North America",
    "Oceania",
    "South America",
]

idxs_for_continents = dict(zip(continents_to_plot, range(len(continents_to_plot))))

combs = list(combinations(continents_to_plot, 2))

# vmin and vmax to use for each axis (for consistency in interpretation between axes)
vmin = 0
vmax = 3e8

# vmin and vmax node and edge densities for consistency across small multiples
vmax_nodes = 100
vmax_edges = 600

# sorting variable for each axis
sorting_variable = "export_value"

fig, axes = plt.subplots(
    len(continents_to_plot), len(continents_to_plot), figsize=(20, 20), dpi=400
)

for c, comb in enumerate(combs):
    hp = hive_plot_n_axes(
        node_list=node_list,
        edges=edges,
        axes_assignments=[*[node_splits[i] for i in comb], None],
        sorting_variables=[sorting_variable] * 3,
        axes_names=[*[i.replace(" ", "\n") for i in comb], "All Other\nContinents"],
        vmins=[vmin] * 3,
        vmaxes=[vmax] * 3,
    )

    ax = axes[idxs_for_continents[comb[0]], idxs_for_continents[comb[1]]]

    _, _, im_nodes, im_edges = datashade_hive_plot_mpl(
        hp,
        fig=fig,
        ax=ax,
        buffer=0.2,
        axes_labels_buffer=1.01,
        axes_labels_fontsize=10,
        text_kwargs={"color": "maroon"},
        vmax_nodes=vmax_nodes,
        vmax_edges=vmax_edges,
    )

    # put titles on top row
    if idxs_for_continents[comb[0]] == 0:
        ax.set_title(comb[1].replace(" ", "\n"), y=1.35, size=20, va="top")

    # put right labels on rightmost column
    if idxs_for_continents[comb[1]] == len(continents_to_plot) - 1:
        ax.text(9, 0.5, comb[0].replace(" ", "\n"), size=20)

# add in intracontinental hive plots on the diagonal
for i, continent in enumerate(continents_to_plot):
    compact_continent_name = continent.replace(" ", "\n")
    hp = hive_plot_n_axes(
        node_list=node_list,
        edges=edges,
        axes_assignments=[node_splits[continent]],
        sorting_variables=[sorting_variable],
        axes_names=[compact_continent_name],
        vmins=[vmin],
        vmaxes=[vmax],
        orient_angle=135,
        repeat_axes=[True],
    )
    # `hive_plot_n_axes()` makes redundant edges when called on one group
    #   but these are quick to delete
    hp.reset_edges(
        axis_id_1=compact_continent_name,
        axis_id_2=f"{compact_continent_name}_repeat",
        a1_to_a2=False,
    )
    ax = axes[idxs_for_continents[continent], idxs_for_continents[continent]]
    datashade_hive_plot_mpl(
        hp,
        tag=0,
        fig=fig,
        ax=ax,
        buffer=0.3,
        axes_labels_buffer=1.01,
        axes_labels_fontsize=10,
        pixel_spread_edges=1,
        pixel_spread_nodes=8,
        text_kwargs={"color": "maroon"},
        vmax_nodes=vmax_nodes,
        vmax_edges=vmax_edges,
    )

    # by construction, this single pair of repeat axes only lives in upper left quadrant
    #  so let's zoom in
    ax.set_xlim(-5.5, 0.5)
    ax.set_ylim(-0.5, 5.5)

    # put title on the one in the top row
    if i == 0:
        ax.set_title(continent.replace(" ", "\n"), y=1.35, size=20, va="top")

    # put right label on the one in the rightmost column
    if i == len(continents_to_plot) - 1:
        ax.text(2, 2.5, continent.replace(" ", "\n"), size=20)


# clear the axes from the untouched other side of the diagonal
for i, _ in enumerate(continents_to_plot):
    for j, _ in enumerate(continents_to_plot):
        if i > j:
            axes[i, j].axis("off")


# colorbars for datashading nodes and edges
ax = axes[0, 0]
cax_edges = ax.inset_axes([0.0, -4, 2, 0.2], transform=ax.transAxes)
cb_edges = fig.colorbar(
    im_edges, ax=ax, cax=cax_edges, orientation="horizontal", extend="max"
)
cb_edges.ax.set_title("Edge Density", size=16)
cb_edges.ax.tick_params(labelsize=16)

cax_nodes = ax.inset_axes([0.0, -4.6, 2, 0.2], transform=ax.transAxes)
cb_nodes = fig.colorbar(
    im_nodes, ax=ax, cax=cax_nodes, orientation="horizontal", extend="max"
)
cb_nodes.ax.set_title("Node Density", size=16)
cb_nodes.ax.tick_params(labelsize=16)

ax.text(
    x=0.0,
    y=-6,
    s=f"{metadata['trade_data_year']} inter-country trade data for "
    f"trade group {metadata['trade_id']}.\n"
    "Trade groups specified according to the "
    "Harmonized System (HS) 1992 classification.\n"
    "An edge corresponds to non-zero trade between two countries.\n"
    f"Nodes placed on each axis sorted by {sorting_variable.replace('_', ' ')} "
    f"(in USD) for trade group {metadata['trade_id']}.\n"
    rf"Each axis spans \$0 - \${int(vmax):,}.\n\n"
    "Data from the Growth Lab at Harvard University.",
    size=15,
    color="gray",
    ha="left",
    transform=ax.transAxes,
)
plt.show()
_images/hive_plots_more_than_three_groups_31_0.png

This structure allows to look at intercontinental trade with respect to each possible pair of continents while still preserving the context of the rest of the trade network in each hive plot. Thanks to the diagonal, we also preserve visualizations of intracontinental trade.

All of our earlier visual anecdotes are still visible:

  • Africa and Oceania export relatively minimal dollar amounts of goods within this trade group.

  • South America (rightmost column) mostly trades with North America, Europe, and Asia, barely trades with Africa, and doesn’t trade with Oceania (or itself apparently) within this trade group.

  • Asia, Europe, and North America trade extenstively with everyone within this trade group.

Plus, we only needed half the matrix to show all the relevant hive plots, which frees up the lower diagonal of the figure for other visualizations. For example, we could fill the lower diagonal with different hive plots by placing the nodes on each axis using a different sorting variable, for example total imports or other socioeconomic data like GDP or population. We could also use graph theoretic measures like page rank or node degree.

Furthermore, for the intracontinental diagonal hive plots, we could sort the two axes with two different sorting variables, allowing us to explore even more nuance in intracontinental trade in the above figure.

The Hive Plot Matrix avoids the pitfall of Option 3 - we do not need to see any information before collapsing groups because we look at all the combinations of collapsed groups over the entire matrix. This thus serves as an excellent means for exploring a network dataset with no priors.

The main problem with HPMs is that the figure is hard to look over on a small screen. For an exploration of network data on a laptop, though, HPMs are an excellent place to start.

References

The Growth Lab at Harvard University, 2019, “International Trade Data (HS, 92)”, https://doi.org/10.7910/DVN/T4CHWJ, Harvard Dataverse, V5.

“The Atlas of Economic Complexity,” Center for International Development at Harvard University, http://www.atlas.cid.harvard.edu.