4 Issues I Discovered Constructing a Information Platform utilizing Medallion Structure within the Final 4 Years | by Danilo Pinto | Jan, 2025

Classes realized from an information platform in a manufacturing surroundings

Picture by Unsplash+ Group on Unsplash

This month, I celebrated 4 years of engaged on a giant knowledge platform that makes use of the medallion structure for knowledge group. All my earlier experiences have been linked to completely different knowledge group approaches. Subsequently, contemplating this milestone, I made a decision to share some classes I’ve realized alongside the way in which, which I consider could also be useful to others working with the same method.

First… some context: What’s the medallion structure?

In abstract, it’s a design sample within the knowledge area meant for logical knowledge group inside a lakehouse. The aim is to progressively enhance the information construction and high quality by means of the layers outlined within the design sample (bronze, silver, gold). In abstract:

  • Bronze layer: Uncooked knowledge. That is the touchdown zone the place the ingested knowledge arrives. There aren’t any knowledge transformations at this stage.
  • Silver layer: Right here, you possibly can apply easy cleanings to your uncooked knowledge and retailer it on this layer. An outlined schema and knowledge varieties are anticipated.
  • Gold layer: This layer represents your consumption layer. It’s the place for advanced aggregations, joins, and enterprise logic. The platform’s customers think about it the place to go when working their knowledge queries.

The next diagram illustrates this idea and is most definitely self-explanatory:

Picture by Writer

If you wish to study extra about this matter, I can refer you to those hyperlinks:

Now, let’s speak concerning the classes realized!

All proper, conceptually talking, every little thing is gorgeous, however when the day by day routines arrives, we should always typically be versatile and elaborate on the most effective resolution for the particular state of affairs in our enterprise.

So, these are the 4 takeaways that I wish to share based mostly on my final 4 years of expertise on this matter:

#1 Don’t be orthodox making use of the medallion structure
As you possibly can see within the references, the official documentation outlines a couple of key steps for making use of the medallion structure. For instance: no schema is required within the bronze layer, and even solely minimal knowledge cleansing is indicated within the silver layer. Nevertheless, relying in your calls for, try to be assured in making some changes based mostly on our undertaking’s enterprise actuality.

In my expertise, there have been a number of situations the place we needed to adapt the rules to attain the most effective outcomes. I can share the next:

  • Now we have knowledge schemas for all of the layers, together with the bronze layer, within the knowledge platform. As we work together with completely different knowledge sources (equivalent to EventHubs, CSV information, Oracle connections, and others), schema enforcement has been adopted to be sure that any non-expected adjustments within the knowledge sources and their knowledge contracts are detected and addressed promptly, stopping disruptions to downstream processes.

#2 New layers, why not?
This one is a bit related with the merchandise above, however it wants a particular part since it could be disruptive for some folks. The purpose right here is that, in some instances, it’s higher to outline your individual particular knowledge layer than strive to determine what the perfect layer within the design by the guide can be.

I’m fairly assured that if you happen to labored on the medallion structure, you already questioned your self concerning the layers’ function and the place it is best to find sure knowledge after a few transformations (depart it in silver? Transfer it to gold?).

Over the previous years, we’ve got arrived at some extent the place:

  • Our undertaking knowledge group consists of layers equivalent to ‘Reference’, ‘Sandbox’, and ‘Checkpoints’, amongst others. These layers have been launched to handle the necessity for exact knowledge location in some situations. For instance, the ‘Reference’ layer was created to retailer lookup knowledge. This clear separation ensures that everybody on the workforce is aware of precisely the place to seek out and add lookup knowledge, eliminating any confusion between the silver and gold layers.

#3 Mappings within the catalog software
You’re possible utilizing an information catalog software. In my present undertaking, we use Unity Catalog for knowledge governance, requiring us to map our exterior storage places (aligned with the medallion structure) to the catalog. This mapping requires cautious consideration.

Since there aren’t any strict constraints between schema names and storage roots, it’s essential to keep away from mismatches. For instance, mapping a bronze desk to a silver schema or every other complicated configuration can result in misinterpretations and errors.

  • In our particular case, we’ve got completely different permission units for Personally Identifiable Info (PII) and non-PII knowledge. To handle this inside the catalog, we mapped each PII and non-PII knowledge to the identical silver layer however differentiated them by inserting them in two separate schemas. This method permits us to keep up the logical grouping of the silver layer whereas imposing granular entry management based mostly on knowledge sensitivity.
Picture by Writer

#4 Versatile however not a large number
Though I’ve simply talked about that you simply would possibly really feel assured making transformations in numerous layers and even including new ones, we needs to be cautious to not fully reconfigure the design sample, as this might make upkeep considerably harder.

The bottom line is that the underlying medallion Structure ought to stay recognizable even with the modifications. For instance, a brand new workforce member ought to simply perceive the information move and acknowledge the established design sample. This consistency is essential for long-term maintainability.

Moreover, watch out when mixing completely different knowledge ideas equivalent to knowledge mesh, knowledge vault, knowledge warehouse, and others as you arrange your knowledge group. Take into account how these ideas combine along with your medallion Structure fairly than being carried out inside it.

Remaining Ideas

I hope that is helpful to you in a roundabout way. If you happen to’ve made it this far, I genuinely respect your consideration and encourage you to share your ideas within the feedback part. 🙂

See you!