Add Data to NodeCollection#

Hive plots require a hiveplotlib.NodeCollection instance representation of node data. This notebook introduces the ways to instantiate a NodeCollection.

[1]:

import pandas as pd
from hiveplotlib import NodeCollection
from hiveplotlib.datasets import example_node_data

Pandas DataFrame#

The simplest way to generate a NodeCollection is with a pandas.DataFrame input.

Below we generate an example dataframe to create a NodeCollection:

[2]:

df = example_node_data()
df

[2]:

	unique_id	low	med	high
0	0	6.363247	14.795079	23.193620
1	1	2.695169	12.321405	21.873202
2	2	0.409326	18.010787	26.718541
3	3	0.165111	19.226066	21.949123
4	4	8.124570	12.658641	25.771102
...	...	...	...	...
95	95	9.562530	15.708242	25.857141
96	96	1.486152	10.064025	21.225680
97	97	9.716562	17.718766	29.328351
98	98	8.890456	19.772874	26.833664
99	99	8.215515	15.892802	28.229576

100 rows × 4 columns

The only required input to instantiate a NodeCollection is the data input, which expects a pandas.DataFrame instance:

[3]:

nodes = NodeCollection(
    data=df,
)
nodes

[3]:

hiveplotlib.NodeCollection of 100 nodes and unique ID column 'index_values'.

Node IDs Must be Unique#

The NodeCollection class requires unique IDs for each node for node referencing in the hive plot. By default, the dataframe index values are used, but we can also set the unique_id_column as demonstrated below:

[4]:

nodes = NodeCollection(
    data=df,
    unique_id_column="unique_id",
)
nodes

[4]:

hiveplotlib.NodeCollection of 100 nodes and unique ID column 'unique_id'.

If the column of unique IDs are not unique, this will raise a RepeatUniqueNodeIDsError:

[5]:

df_with_copies = pd.concat([df.copy(), df.copy()]).sort_values(by="unique_id")
df_with_copies.head()

[5]:

	unique_id	low	med	high
0	0	6.363247	14.795079	23.193620
0	0	6.363247	14.795079	23.193620
1	1	2.695169	12.321405	21.873202
1	1	2.695169	12.321405	21.873202
2	2	0.409326	18.010787	26.718541

[6]:

import traceback

from hiveplotlib.exceptions import RepeatUniqueNodeIDsError

try:
    NodeCollection(data=df_with_copies, unique_id_column="unique_id")
except RepeatUniqueNodeIDsError:
    traceback.print_exc()

Traceback (most recent call last):
  File "/tmp/ipykernel_1877851/3574569270.py", line 6, in <module>
    NodeCollection(data=df_with_copies, unique_id_column="unique_id")
  File "/home/garyk/repos/hiveplotlib/src/hiveplotlib/node.py", line 160, in __init__
    raise RepeatUniqueNodeIDsError(msg)
hiveplotlib.exceptions.node.RepeatUniqueNodeIDsError: Found repeat unique IDs:
    unique_id       low        med       high
0           0  6.363247  14.795079  23.193620
0           0  6.363247  14.795079  23.193620
1           1  2.695169  12.321405  21.873202
1           1  2.695169  12.321405  21.873202
2           2  0.409326  18.010787  26.718541
..        ...       ...        ...        ...
97         97  9.716562  17.718766  29.328351
98         98  8.890456  19.772874  26.833664
98         98  8.890456  19.772874  26.833664
99         99  8.215515  15.892802  28.229576
99         99  8.215515  15.892802  28.229576

[200 rows x 4 columns]

NetworkX Graph#

We can also generate a NodeCollection directly from a NetworkX graph. For more on this, see the Creating Hive Plots from NetworkX page.