Saving Pandas DataFrames Effectively and Rapidly — Parquet vs Feather vs ORC vs CSV | by Mike Clayton | Nov, 2024

Optimisation

Pace, RAM, measurement, and comfort. Which storage technique is greatest?

bar chart comparing output file sizes for mixed data in a dataframe for file formats csv, feather, orc and parquet
Write output file sizes for combined knowledge — Picture by Writer

With the ever-increasing quantity of knowledge that’s produced there may be inevitably a must retailer, and reload, that knowledge effectively and shortly.

CSV has been the go to staple for a very long time. Nevertheless, there are significantly better alternate options particularly designed to deal instantly with the storage, and environment friendly re-loading, of tabular knowledge.

So, how a lot are you shedding out if you’re nonetheless utilizing CSV format for storage of your knowledge tables? And which various must you think about?

In terms of storing tabular knowledge the perfect could be:

  • Quick to put in writing
  • Quick to learn
  • Low RAM utilization
  • Low storage necessities
  • Good choices for compression

An choice to learn solely a part of the info, with out loading the entire dataset, would even be a superb addition to the above.

The record outlined above will subsequently kind the bottom of testing a few of the extra broadly used strategies in opposition to these…