Add Data to NodeCollection#
Hive plots require a hiveplotlib.NodeCollection instance representation of node data. This notebook introduces the ways to instantiate a NodeCollection.
[1]:
import pandas as pd
from hiveplotlib import NodeCollection
from hiveplotlib.datasets import example_node_data
Pandas DataFrame#
The simplest way to generate a NodeCollection is with a pandas.DataFrame input.
Below we generate an example dataframe to create a NodeCollection:
[2]:
df = example_node_data()
df
[2]:
| unique_id | low | med | high | |
|---|---|---|---|---|
| 0 | 0 | 6.363247 | 14.795079 | 23.193620 |
| 1 | 1 | 2.695169 | 12.321405 | 21.873202 |
| 2 | 2 | 0.409326 | 18.010787 | 26.718541 |
| 3 | 3 | 0.165111 | 19.226066 | 21.949123 |
| 4 | 4 | 8.124570 | 12.658641 | 25.771102 |
| ... | ... | ... | ... | ... |
| 95 | 95 | 9.562530 | 15.708242 | 25.857141 |
| 96 | 96 | 1.486152 | 10.064025 | 21.225680 |
| 97 | 97 | 9.716562 | 17.718766 | 29.328351 |
| 98 | 98 | 8.890456 | 19.772874 | 26.833664 |
| 99 | 99 | 8.215515 | 15.892802 | 28.229576 |
100 rows × 4 columns
The only required input to instantiate a NodeCollection is the data input, which expects a pandas.DataFrame instance:
[3]:
nodes = NodeCollection(
data=df,
)
nodes
[3]:
hiveplotlib.NodeCollection of 100 nodes and unique ID column 'index_values'.
Node IDs Must be Unique#
The NodeCollection class requires unique IDs for each node for node referencing in the hive plot. By default, the dataframe index values are used, but we can also set the unique_id_column as demonstrated below:
[4]:
nodes = NodeCollection(
data=df,
unique_id_column="unique_id",
)
nodes
[4]:
hiveplotlib.NodeCollection of 100 nodes and unique ID column 'unique_id'.
If the column of unique IDs are not unique, this will raise a RepeatUniqueNodeIDsError:
[5]:
df_with_copies = pd.concat([df.copy(), df.copy()]).sort_values(by="unique_id")
df_with_copies.head()
[5]:
| unique_id | low | med | high | |
|---|---|---|---|---|
| 0 | 0 | 6.363247 | 14.795079 | 23.193620 |
| 0 | 0 | 6.363247 | 14.795079 | 23.193620 |
| 1 | 1 | 2.695169 | 12.321405 | 21.873202 |
| 1 | 1 | 2.695169 | 12.321405 | 21.873202 |
| 2 | 2 | 0.409326 | 18.010787 | 26.718541 |
[6]:
import traceback
from hiveplotlib.exceptions import RepeatUniqueNodeIDsError
try:
NodeCollection(data=df_with_copies, unique_id_column="unique_id")
except RepeatUniqueNodeIDsError:
traceback.print_exc()
Traceback (most recent call last):
File "/tmp/ipykernel_19743/3574569270.py", line 6, in <module>
NodeCollection(data=df_with_copies, unique_id_column="unique_id")
File "/home/garyk/repos/hiveplotlib/src/hiveplotlib/node.py", line 160, in __init__
raise RepeatUniqueNodeIDsError(msg)
hiveplotlib.exceptions.node.RepeatUniqueNodeIDsError: Found repeat unique IDs:
unique_id low med high
0 0 6.363247 14.795079 23.193620
0 0 6.363247 14.795079 23.193620
1 1 2.695169 12.321405 21.873202
1 1 2.695169 12.321405 21.873202
2 2 0.409326 18.010787 26.718541
.. ... ... ... ...
97 97 9.716562 17.718766 29.328351
98 98 8.890456 19.772874 26.833664
98 98 8.890456 19.772874 26.833664
99 99 8.215515 15.892802 28.229576
99 99 8.215515 15.892802 28.229576
[200 rows x 4 columns]
Networkx Graph#
We can also generate a NodeCollection directly from a Networkx graph. For more on this, see the Creating Hive Plots from Networkx page.