mirror of
https://github.com/Brandon-Rozek/website.git
synced 2025-02-23 02:51:20 +00:00
New Post
This commit is contained in:
parent
0a9c91a92f
commit
5cd2409afc
1 changed files with 31 additions and 0 deletions
31
content/blog/iterativecsv.md
Normal file
31
content/blog/iterativecsv.md
Normal file
|
@ -0,0 +1,31 @@
|
|||
---
|
||||
title: "Iteratively Read CSV"
|
||||
date: 2020-04-11T21:34:33-04:00
|
||||
draft: false
|
||||
tags: ["python"]
|
||||
---
|
||||
|
||||
If you want to analyze a CSV dataset that is larger than the space available in RAM, then you can iteratively process each observation and store/calculate only what you need. There is a way to do this in standard Python as well as the popular library Pandas.
|
||||
|
||||
## Standard Library
|
||||
|
||||
```python
|
||||
import csv
|
||||
with open('/path/to/data.csv', newline='') as csvfile:
|
||||
reader = csv.reader(csvfile, delimeter=',')
|
||||
for row in reader:
|
||||
for column in row:
|
||||
do_something()
|
||||
```
|
||||
|
||||
## Pandas
|
||||
|
||||
Pandas is slightly different in where you specify a `chunksize` which is the number of rows per chunk and you get a pandas dataframe with that many rows
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
chunksize = 100
|
||||
for chunk in pd.read_csv('/path/to/data.csv', chunksize=chunksize):
|
||||
do_something(chunk)
|
||||
```
|
||||
|
Loading…
Reference in a new issue