website/content/blog/networkx-random-sample-graph.md

---
date: 2022-04-07 19:48:12-04:00
draft: false
math: false
medium_enabled: true
medium_post_id: e5c5330cc9a7
tags:
- Python
title: Networkx Random Sample Graph
---

I've been working on several algorithms in `networkx`. In order to speed up testing, especially on large graphs, I've been randomly sampling portions of the original graph. The best way I've found to do this is through the following python snippet:

```python
import random
random_sample_edges = random.sample(list(G.edges), SAMPLE_SIZE)
G_sample = nx.Graph()
G_sample.add_edges_from(random_sample_edges)
```

It might be tempting to sample the nodes and then grab the subgraph like the following:

```python
import random
random_nodes = random.sample(list(G.nodes), SAMPLE_SIZE)
G_sample = G.subgraph(random_nodes)
```

However, only considering the nodes when sampling  makes it highly likely that the subgraph will significantly less edges. This results in a mostly disconnected subgraph and a loss of information. Sampling the edges prevents this issue at the expense of not capturing single nodes not connected to anything else.
New Post 2022-04-07 19:58:20 -04:00			`---`
Added medium syndication metadata 2023-02-18 13:12:02 -05:00			`date: 2022-04-07 19:48:12-04:00`
New Post 2022-04-07 19:58:20 -04:00			`draft: false`
			`math: false`
Medium syndication information 2023-01-05 14:04:45 -05:00			`medium_enabled: true`
Added medium syndication metadata 2023-02-18 13:12:02 -05:00			`medium_post_id: e5c5330cc9a7`
			`tags:`
			`- Python`
			`title: Networkx Random Sample Graph`
New Post 2022-04-07 19:58:20 -04:00			`---`

			I've been working on several algorithms in `networkx`. In order to speed up testing, especially on large graphs, I've been randomly sampling portions of the original graph. The best way I've found to do this is through the following python snippet:

			```python
			`import random`
			`random_sample_edges = random.sample(list(G.edges), SAMPLE_SIZE)`
			`G_sample = nx.Graph()`
			`G_sample.add_edges_from(random_sample_edges)`
			```

			`It might be tempting to sample the nodes and then grab the subgraph like the following:`

			```python
			`import random`
			`random_nodes = random.sample(list(G.nodes), SAMPLE_SIZE)`
			`G_sample = G.subgraph(random_nodes)`
			```

Added medium syndication metadata 2023-02-18 13:12:02 -05:00			`However, only considering the nodes when sampling makes it highly likely that the subgraph will significantly less edges. This results in a mostly disconnected subgraph and a loss of information. Sampling the edges prevents this issue at the expense of not capturing single nodes not connected to anything else.`