Seamless Parsing of Nested JSON and Schema Evolution in Delta Dwell Tables With out Restarting Pipelines | by Irfan Elahi | Oct, 2024

Based mostly on a buyer case research, a complicated tutorial on utilizing Delta Dwell Tables to course of JSON schema evolution with out requiring to restart

Generated through DALL-E

Disclaimer: I’m a options architect at Databricks. The views and opinions expressed on this article are my very own and don’t essentially replicate these of Databricks.

Schema evolution is a typical phenomenon on the planet of knowledge engineering. When extracting information from sources and loading it right into a vacation spot, modifications within the supply schema are inevitable. This problem is amplified when coping with supply methods that embody JSON payloads, corresponding to JSON-type columns in PostgreSQL. The chance of schema modifications inside these JSON payloads is excessive — new fields might be added at any time, usually deeply nested at varied ranges. These frequent modifications considerably enhance the complexity of constructing strong information pipelines that parse such schema modifications and evolve the schema seamlessly.

The Databricks Intelligence Platform, powered by the Delta Lake format, affords strong assist for schema evolution, making certain flexibility and resilience when coping with modifications in information construction. Delta Lake can…