PySpark Defined: Delta Tables. Discover ways to use the constructing blocks of… | by Thomas Reid | Aug, 2024

Discover ways to use the constructing blocks of Delta Lakes.

Delta tables are the important thing parts of a Delta Lake, an open-source storage layer that brings ACID (Atomicity, Consistency, Isolation, Sturdiness) transactions to huge information workloads.

The idea and implementation of Delta tables ( and by affiliation — Delta Lakes ) had been finished by the crew at Databricks, the corporate that created Spark.

Databricks is now a cloud-based platform for information engineering, machine studying, and analytics constructed round Apache Spark and offers a unified setting for working with huge information workloads. Delta tables are a key part of that setting.

Coming from an AWS background, Delta tables considerably remind me of AWS’s Athena service, which allows you to carry out SQL SELECT operations on information held on S3, the AWS mass storage service.

There’s one key distinction although. Athena is designed to be a query-only device, whereas Delta tables can help you UPDATE, DELETE and INSERT information data simply in addition to question information from them. On this respect, Delta tables act extra like Apache Iceberg formatted tables. However the benefit they’ve over Iceberg tables is that they’re extra tightly built-in with the Spark eco-system.