New Post

2025-12-08 11:10:25 +00:00 · 2020-04-11 22:10:19 -04:00 · 2020-04-11 22:10:19 -04:00 · 5cd2409afc
commit 5cd2409afc
parent 0a9c91a92f
1 changed files with 31 additions and 0 deletions
--- a/content/blog/iterativecsv.md
+++ b/content/blog/iterativecsv.md
@ -0,0 +1,31 @@
+---
+title: "Iteratively Read CSV"
+date: 2020-04-11T21:34:33-04:00
+draft: false
+tags: ["python"]
+---
+
+If you want to analyze a CSV dataset that is larger than the space available in RAM, then you can iteratively process each observation and store/calculate only what you need. There is a way to do this in standard Python as well as the popular library Pandas.
+
+## Standard Library
+
+ ```python
+import csv
+with open('/path/to/data.csv', newline='') as csvfile:
+    reader = csv.reader(csvfile, delimeter=',')
+    for row in reader:
+        for column in row:
+            do_something()
+ ```
+
+## Pandas
+
+Pandas is slightly different in where you specify a `chunksize` which is the number of rows per chunk and you get a pandas dataframe with that many rows
+
+```python
+import pandas as pd
+chunksize = 100
+for chunk in pd.read_csv('/path/to/data.csv', chunksize=chunksize):
+    do_something(chunk)
+```
+