Add Data to Edges#

Hive plots require a hiveplotlib.Edges instance representation of edge data. This notebook introduces the ways to instantiate an Edges object.

[1]:

import numpy as np
import pandas as pd
from hiveplotlib import Edges
from hiveplotlib.datasets import example_node_collection

In order to generate edges, we must have nodes to connect, so we will use the following example nodes:

[2]:

nodes = example_node_collection()

We can then generate edges, which are represented by origin node IDs and destination node IDs:

[3]:

# randomly generate `(num_edges, 2)` edge array
rng = np.random.default_rng(0)
num_edges = 10

# sample from the unique node IDs
node_ids = nodes.data[nodes.unique_id_column].to_numpy()

edge_array = rng.choice(node_ids, size=num_edges * 2).reshape(-1, 2)
edge_array

[3]:

array([[85, 63],
       [51, 26],
       [30,  4],
       [ 7,  1],
       [17, 81],
       [64, 91],
       [50, 60],
       [97, 72],
       [63, 54],
       [55, 93]])

Numpy Array#

The Edges class accepts a 2 column numpy.ndarray input, where the first column represents the starting node ID of each edge, and the second column represents the ending node ID of each edge.

[4]:

edges = Edges(data=edge_array)
edges

[4]:

hiveplotlib.Edges of 10 edges.

Note when providing this array input, the edge data is stored as a pandas.DataFrame with default column names from (the start of each edge) and to (the end of each edge):

[5]:

edges.data

[5]:

	from	to
0	85	63
1	51	26
2	30	4
3	7	1
4	17	81
5	64	91
6	50	60
7	97	72
8	63	54
9	55	93

Pandas DataFrame#

The Edges class also accepts pandas.DataFrame inputs for edge data:

[6]:

edge_df = pd.DataFrame(edge_array, columns=["from", "to"])
edges = Edges(
    data=edge_df,
)
edges

[6]:

hiveplotlib.Edges of 10 edges.

When passing a pandas.DataFrame input, the expected column names are from for the start of the edge and to for the end of the edge. If your column names are different, then they will need to be explicitly specified with the from_column_name and to_column_name parameters, or an error will be raised:

[7]:

edge_df = pd.DataFrame(edge_array, columns=["source", "sink"])
edges = Edges(
    data=edge_df,
    from_column_name="source",
    to_column_name="sink",
)
edges

[7]:

hiveplotlib.Edges of 10 edges.

Adding Edge Metadata#

If we add our edge data as a pandas.DataFrame, then our dataframe can also include additional edge metadata columns:

[8]:

edge_df = pd.DataFrame(edge_array, columns=["from", "to"])

# when using pandas, we can add whatever edge metadata we want
edge_df["metadata_col"] = (edge_df["from"] + edge_df["to"]) / 2

edges = Edges(
    data=edge_df,
)
edges

[8]:

hiveplotlib.Edges of 10 edges.

[9]:

edges.data

[9]:

	from	to	metadata_col
0	85	63	74.0
1	51	26	38.5
2	30	4	17.0
3	7	1	4.0
4	17	81	49.0
5	64	91	77.5
6	50	60	55.0
7	97	72	84.5
8	63	54	58.5
9	55	93	74.0

These metadata columns can then be used to modify edge visualization in a hive plot. For more on this, see the Visualizing Edge Metadata page.