- Introduction — What’s Tablib?
- Working with Datasets
- Importing Information
- Exporting Information
- Dynamic Columns
- Formatters
- Wrapping Up
For a few years I’ve been working with instruments like Pandas and PySpark in Python for information ingestion, information processing, and information exporting. These instruments are nice for advanced information transformations and large information sizes (Pandas when the information matches in reminiscence). Nonetheless, typically I’ve used these instruments when the next circumstances apply:
- The info dimension is comparatively small. Assume effectively under 100,000 rows of information.
- Efficiency will not be a difficulty in any respect. Consider a one-off job or a job that repeats at midnight each evening, however I don’t care if it takes 20 seconds or 5 minutes.
- There aren’t any advanced transformations wanted. Consider merely importing 20 JSON recordsdata with the identical format, stacking them on high of one another, after which exporting this as a CSV file.