mirror of
https://github.com/Brandon-Rozek/website.git
synced 2024-11-09 10:40:34 -05:00
32 lines
919 B
Markdown
32 lines
919 B
Markdown
---
|
|
title: "Iteratively Read CSV"
|
|
date: 2020-04-11T21:34:33-04:00
|
|
draft: false
|
|
tags: ["Python"]
|
|
medium_enabled: true
|
|
---
|
|
|
|
If you want to analyze a CSV dataset that is larger than the space available in RAM, then you can iteratively process each observation and store/calculate only what you need. There is a way to do this in standard Python as well as the popular library Pandas.
|
|
|
|
## Standard Library
|
|
|
|
```python
|
|
import csv
|
|
with open('/path/to/data.csv', newline='') as csvfile:
|
|
reader = csv.reader(csvfile, delimeter=',')
|
|
for row in reader:
|
|
for column in row:
|
|
do_something()
|
|
```
|
|
|
|
## Pandas
|
|
|
|
Pandas is slightly different in where you specify a `chunksize` which is the number of rows per chunk and you get a pandas dataframe with that many rows
|
|
|
|
```python
|
|
import pandas as pd
|
|
chunksize = 100
|
|
for chunk in pd.read_csv('/path/to/data.csv', chunksize=chunksize):
|
|
do_something(chunk)
|
|
```
|
|
|