From 5cd2409afc34c297cc2f93600e9886eec48ab8d8 Mon Sep 17 00:00:00 2001
From: Brandon Rozek <rozekbrandon@gmail.com>
Date: Sat, 11 Apr 2020 22:10:19 -0400
Subject: [PATCH] New Post

---
 content/blog/iterativecsv.md | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)
 create mode 100644 content/blog/iterativecsv.md

diff --git a/content/blog/iterativecsv.md b/content/blog/iterativecsv.md
new file mode 100644
index 0000000..a906b98
--- /dev/null
+++ b/content/blog/iterativecsv.md
@@ -0,0 +1,31 @@
+---
+title: "Iteratively Read CSV"
+date: 2020-04-11T21:34:33-04:00
+draft: false
+tags: ["python"]
+---
+
+If you want to analyze a CSV dataset that is larger than the space available in RAM, then you can iteratively process each observation and store/calculate only what you need. There is a way to do this in standard Python as well as the popular library Pandas.
+
+## Standard Library
+
+ ```python
+import csv
+with open('/path/to/data.csv', newline='') as csvfile:
+    reader = csv.reader(csvfile, delimeter=',')
+    for row in reader:
+        for column in row:
+            do_something()
+ ```
+
+## Pandas
+
+Pandas is slightly different in where you specify a `chunksize` which is the number of rows per chunk and you get a pandas dataframe with that many rows
+
+```python
+import pandas as pd
+chunksize = 100
+for chunk in pd.read_csv('/path/to/data.csv', chunksize=chunksize):
+    do_something(chunk)
+```
+