OLAP is Useless — Or Is It ?. OLAP’s destiny within the age of contemporary… | by Marc Polizzi | Oct, 2024

OLAP’s destiny within the age of contemporary analytics

In 1993, E.F. Codd & Associates launched the time period OLAP (On-line Analytical Processing) to explain methods used for answering multidimensional analytical queries from numerous views. OLAP primarily entails three key operations :

  • Roll-up : Summarizing information at greater ranges of aggregation,
  • Drill-down : Navigating to extra detailed ranges of knowledge,
  • Slice and cube : Choosing and analyzing information from completely different viewpoints.

Looking the net these days, it looks like each information analytics problem is in some way tied to modern self-service BI, targeted on crunching Massive Information with AI on steroids. Platforms like LinkedIn and Reddit are flooded with limitless discussions concerning the disadvantages of outdated OLAP in comparison with the most recent tendencies in information analytics for all. So sure, we are able to confidently declare: OLAP is useless. However wait… is it actually?

RIP OLAP (Picture by the writer — AI generated)

Who Am I and Why This Put up ?

Earlier than we dive into that disputed topic, let me introduce myself and clarify why I’m bothering you with this put up. I work at icCube, the place amongst others, I resolve the technical challenges of our clients. Often, the gross sales staff asks me to hitch demos for potential shoppers, and virtually, with out fail, the central concern of knowledge scalability comes up — to deal with the (soon-to-be) Massive Information of that buyer. Being a technical and pragmatic individual, my naive, non-sales response can be :

May we first please outline the precise issues to see if we actually want to speak about Massive Information ?

Ouch 😉 Instructed you, I’m a techie at coronary heart. So, on this put up, I’d wish to make clear what OLAP means in 2024 and the sorts of challenges it could actually resolve. I’ll draw from my expertise at icCube, so I may be a bit biased, however I’ll do my greatest to stay goal. Be happy to share your ideas within the feedback.

OLAP is usually, if not all the time, used interchangeably with OLAP Dice — i.e., a materialized construction of pre-aggregated values in a multidimensional area. With this fallacious definition, it’s straightforward to see why folks may say OLAP is outdated, as advances in know-how have decreased the necessity for pre-aggregation.

Nevertheless, OLAP is just not synonymous with OLAP Cubes. If there’s one factor I’d spotlight from the assorted definitions and discussions about OLAP, it’s that OLAP embodies a set of ideas and strategies for effectively analyzing multidimensional information.

Chris Webb captured this effectively in a put up, reflecting again within the previous days:

By “OLAP” I imply the concept of a centralised mannequin containing not simply all of your information but additionally issues like how your tables ought to be joined, how measures combination up, superior calculations and KPIs and so forth.

In his put up, “Is OLAP Useless”, Chris Webb additionally referred to the FASMI Take a look at as a technique to qualify an OLAP system in simply 5 key phrases : “Quick Evaluation of Shared Multidimensional Data”.

FAST              : implies that the system is focused to ship most
responses to customers inside about 5 seconds, with the
easiest analyses taking no multiple second and
only a few taking greater than 20 seconds.

ANALYSIS : implies that the system can deal with any enterprise logic
and statistical evaluation that's related for the
software and the person, and maintain it straightforward sufficient for
the goal person.

SHARED : implies that the system implements all the safety
necessities for confidentiality (probably all the way down to cell
stage).

MULTIDIMENSIONAL : is our key requirement. If we needed to choose a one-word
definition of OLAP, that is it. The system should present
a multidimensional conceptual view of the info,
together with full help for hierarchies and a number of
hierarchies, as that is actually essentially the most logical means
to investigate companies and organizations.

INFORMATION : is all the information and derived info wanted,
wherever it's and nevertheless a lot is related for the
software.

I discovered it amusing to comprehend that this definition was launched again in 2005, in a put up subtitled :

An evaluation of what the usually misused OLAP time period is meant to imply.

So, it’s fairly clear that this confusion is just not one thing new, and our advertising and marketing and gross sales colleagues have contributed to it. Notice that this definition doesn’t specify how an OLAP system ought to be applied. An OLAP dice is only one potential know-how for implementing an OLAP answer.

Based mostly on my information area expertise, MULTIDIMENSIONAL and SHARED are the important thing necessities. I’d change SHARED by SECURED and make “ all the way down to cell stage ” not an choice — a fancy multidimensional information mannequin with safety constraints inevitably means ultimately a fancy safety profile. Notice that the FASMI Take a look at doesn’t mandate something concerning absolutely the dimension of the info being analyzed.

Earlier than diving into the 5 key phrases and exhibiting how they apply to trendy instruments, let’s first problem a number of extensively held beliefs.

Inevitably, the Massive Information argument is used to claim that OLAP is useless.

I couldn’t agree much less with that assertion. Nevertheless, let’s see what Jordan Tigani is saying within the introduction of his “BIG DATA IS DEAD” put up from early 2023 :

After all, after the Massive Information process pressure bought all new tooling and migrated from Legacy methods, folks discovered that they nonetheless had been having bother making sense of their information. Additionally they could have observed, in the event that they had been actually paying consideration, that information dimension wasn’t actually the issue in any respect.

It’s a really participating and informative put up, past the advertising and marketing hype. I really feel there’s no want for me to reiterate right here what I’m experiencing on a a lot smaller scale in my job. His conclusion :

Massive Information is actual, however most individuals could not want to fret about it. Some questions you can ask to determine in the event you’re a “Massive Information One-Percenter”:

– Are you actually producing an enormous quantity of knowledge?

– If that’s the case, do you actually need to make use of an enormous quantity of knowledge directly?

– If that’s the case, is the info actually too large to suit on one machine?

– If that’s the case, are you certain you’re not only a information hoarder?

– If that’s the case, are you certain you wouldn’t be higher off summarizing?

In case you reply no to any of those questions, you may be a superb candidate for a brand new technology of knowledge instruments that make it easier to deal with information on the dimension you even have, not the scale that individuals attempt to scare you into pondering that you just might need sometime.

I’ve nothing so as to add at this level. Later on this put up, we’ll discover how trendy OLAP instruments might help you handle information on the scale you’re working with.

Inevitably, the Self-Service BI is one other argument used to claim that OLAP is useless.

Enterprise customers are empowered to entry and work with uncooked company information independently, without having help from information professionals. This strategy permits customers to carry out their very own analyses, generate studies, and create dashboards utilizing user-friendly instruments and interfaces.

If we acknowledge that the required analytics are easy sufficient for any businessperson to deal with, or that the instruments are superior sufficient to handle extra complicated analytics and safety profiles, then the underlying assumption is that the info is already clear and prepared for making enterprise choices.

In icCube, in the course of the enablement section of buyer initiatives, 80% of the time is spent cleansing and understanding the precise information and the enterprise mannequin behind it. Surprisingly, a good portion of this time can also be spent speaking with the few people who possess data of each the technical and enterprise worlds. This isn’t shocking, as the info mannequin sometimes evolves over a few years, turns into more and more complicated, and other people come and go.

However let’s assume the uncooked information is clear and the enterprise customers perceive it completely. Then what occurs when a whole lot (and even 1000’s) of studies are created, probably accessing the OLTP databases (as there is no such thing as a IT involvement in creating an analytical information repository)? Are they in line with one another? Are they following the identical enterprise guidelines? Are they computing issues proper? Are they inflicting efficiency points?

And assuming all is ok, then how do you preserve these studies? And extra importantly, how do you handle any required change within the underlying uncooked information as there is no such thing as a straightforward technique to know what information is used the place?

So equally to the Massive Information argument, I don’t imagine that Self-Service BI is the precise answer for each trendy analytical problem. In reality, it could actually create extra issues in the long term.

Finally the AI argument. You now not want your OLAP engine, and by the best way, you now not want any analytical software. AI is right here to rule all of them! I’m exaggerating a bit, however I’m not far off when contemplating all of the hype round AI 😉

Extra severely, at icCube, even when we’re presently skeptical about utilizing AI to generate MDX code or to investigate information, it actually doesn’t imply we’re towards AI. Fairly the opposite, the truth is. We’ve not too long ago launched a chatbot widget to assist finish customers perceive their information. We’re actively investigating use AI to enhance the productiveness of our clients. The precise points we’re going through with it are primarily:

  • It’s not correct sufficient to present to finish customers who can not distinguish the hallucinations.
  • It’s overkill to present to finish customers who’re professional within the area and might perceive and repair the hallucinations.
  • The price of every question (that’s the LLM inference value).

However don’t simply take my phrase for it — I’d like to focus on the sensible and related strategy shared by Marco Russo. You’ll be able to take a look at his YouTube video right here. For these quick on time, skip forward to the 32-minute mark the place Marco is sharing his emotions about ChatGPT getting used to generate DAX code.

Proper now, generative AI is just not prepared to exchange any OLAP system and positively can’t be used as an argument to say OLAP is useless.

Now, let’s return to the FASMI Take a look at and try the 5 key phrases that outline an OLAP system.

implies that the system is focused to ship most responses to customers
inside about 5 seconds, with the only analyses taking not more than
one second and only a few taking greater than 20 seconds.

Delivering quick response time to analytical queries is now not unique to OLAP methods. Nevertheless, it stays an added good thing about OLAP methods, that are particularly tailor-made for such queries. One important benefit is that it helps keep away from overloading OLTP databases (or any precise sources of knowledge) as a result of :

  • A devoted information warehouse could have been created.
  • It could act as a cache in entrance of the particular information sources.

A further good thing about this intermediate layer is that it could actually assist cut back the prices related to accessing the underlying uncooked information.

implies that the system can deal with any enterprise logic and statistical
evaluation that's related for the applying and the person, and maintain it
straightforward sufficient for the goal person.

OLAP methods are designed to carry out complicated analytical queries and, as such, provide a variety of options which might be typically not obtainable out of the field in different methods. A few of these options embrace :

  • Slice-and-dice capabilities : permits customers to discover information from completely different views and dimensions.
  • Pure navigation : helps intuitive navigation by means of dad or mum/baby hierarchies within the multidimensional mannequin.
  • Aggregation measures : helps numerous aggregations equivalent to sum, min, max, opening, closing values, and extra.

To help all these capabilities, a specialised question language is required. MDX (Multi-Dimensional Expressions) is the de facto customary for multidimensional evaluation.

Some superior and probably non-standard options that we continuously use with our clients are :

  • Time interval comparisons : facilitates time-based analyses like year-over-year comparisons.
  • Calculated measures : allows the creation of ad-hoc calculations at design or runtime.
  • Calculated members : much like calculated measures however might be utilized to any dimension. For instance, they can be utilized to create helper dimensions with members performing statistics based mostly on the present analysis context.
  • Superior mathematical operations : gives vectors and different constructions for performing complicated mathematical calculations elegantly (statistics, regressions… ).
  • MDX extensions : features, Java code embedding, end result post-processing, and extra.
implies that the system implements all the safety necessities for
confidentiality (probably all the way down to cell stage).

Based mostly on my expertise, I imagine that is the second most essential requirement after the multidimensional mannequin. In each buyer mannequin the place safety is required, defining correct authorization turns into a major problem.

I’d counsel bettering the FASMI Take a look at by making cell-level granularity obligatory.

Each Microsoft Evaluation Companies, icCube, and doubtlessly different platforms permit safety to be outlined straight throughout the multidimensional mannequin utilizing the MDX language (launched within the subsequent level). This strategy is sort of pure and sometimes aligns naturally with company hierarchical safety constructions.

Defining safety on the multidimensional mannequin stage is especially essential when the mannequin is constructed from a number of information sources. As an example, making use of company safety to information from sources like IoT sensors may very well be very complicated with out this functionality.

For the reason that FASMI Take a look at was launched, embedding analytics straight into functions has turn into a crucial requirement. Many OLAP methods, together with Microsoft Evaluation Companies and icCube, now help the dynamic creation of safety profiles at runtime — as soon as customers are authenticated — based mostly on numerous person attributes. As soon as this safety template is outlined, it will likely be utilized on-the-fly every time a person logs into the system.

is our key requirement. If we needed to choose a one-word definition of OLAP,
that is it. The system should present a multidimensional conceptual view of
the info, together with full help for hierarchies and a number of
hierarchies, as that is actually essentially the most logical technique to analyze
companies and organizations.

I utterly agree. A multidimensional mannequin is important for information analytics as a result of it gives a structured strategy to analyzing complicated information from a number of views (information doesn’t exist in isolation) and sometimes aligns with company hierarchical safety frameworks.

Intuitive for Enterprise Customers

This mannequin mirrors how companies naturally take into consideration their information — whether or not it’s merchandise, clients, or time durations. It’s way more intuitive for non-technical customers, permitting them to discover information without having to grasp complicated SQL queries. Key options like parent-child hierarchies and many-to-many relationships are seamlessly built-in.

Enhanced Information Aggregation and Summarization

The mannequin is constructed to deal with aggregations (like sum, common, depend) throughout dimensions, which is essential for summarizing information at numerous ranges. It’s excellent for creating dashboards that current a high-level overview, with the flexibility to drill down into extra detailed insights as wanted.

Facilitates Time Collection Evaluation

Time is a crucial dimension in lots of kinds of information evaluation, equivalent to monitoring tendencies, forecasting, and measuring efficiency over durations. A multidimensional mannequin simply integrates time as a dimension, enabling temporal evaluation, equivalent to year-over-year or month-over-month comparisons.

Information Complexity within the Actual World

Regardless of the rise of no-code information instruments, real-world information initiatives are hardly ever easy. Information sources are typically messy, evolving over time with inconsistencies that add complexity. Accessing uncooked information might be difficult with conventional SQL-based approaches. Given the scarcity of expert expertise, it’s smart to first set up a clear semantic layer, making certain information is used accurately and that future data-driven choices are well-informed.

Belief and Reliability in Analytics

One main benefit of a well-defined multidimensional mannequin (or semantic layer) is the belief it fosters within the analytics offered to clients. This sturdy mannequin permits for efficient testing, enabling agile responses in as we speak’s fast-paced setting.

Perceived Inflexibility

The semantic layer in OLAP serves as an important step earlier than information entry, and whereas it could initially appear to restrict flexibility, it ensures that information is modeled accurately from the beginning, simplifying future reporting. In lots of instances, this “inflexibility” is extra perceived than actual. Fashionable OLAP instruments, like icCube, don’t depend on outdated, cumbersome processes for creating OLAP cubes and might even help incremental updates. For instance, icCube’s class function permits even new dimensions to be created at runtime.

In abstract, OLAP and dimensional fashions proceed to supply crucial benefits in dealing with complicated enterprise logic, safety, regardless of the perceived inflexibility when in comparison with direct uncooked information entry.

is all the information and derived info wanted, wherever it's and
nevertheless a lot is related for the applying.

Pulling information from numerous sources — whether or not SQL, NoSQL, IoT, recordsdata, or SaaS platforms — is now not one thing unique to OLAP methods. Nevertheless, OLAP methods nonetheless provide a key benefit: they’re designed particularly to create a safe multidimensional mannequin that serves because the de facto semantic layer to your analytical wants.

The unique definition of the FASMI Take a look at aimed to supply a transparent and memorable description of an On-line Analytical Processing (OLAP) system: Quick Evaluation of Shared Multidimensional Data. I imagine this definition stays related and is extra essential than ever. In 2024, folks ought to now not confuse OLAP with one in all its previous implementations — the outdated OLAP Dice.

As a sensible individual, I gained’t counsel particular instruments with out understanding your present information analytics challenges. I like to recommend fastidiously figuring out your present wants after which in search of the precise software. Most significantly, in the event you’re happy together with your present analytical platform, don’t change it only for the sake of utilizing the most recent fashionable software.

Nevertheless, in the event you’re :

  • struggling to question complicated multidimensional enterprise fashions,
  • struggling to use complicated safety that should align with company hierarchical safety fashions,
  • struggling to jot down complicated calculation for superior analytics,
  • struggling to handle 100s and/or 1000s of very disparate queries/dashboards,
  • struggling to open dashboards in underneath a second,
  • struggling to supply and merge information from disparate methods,
  • struggling to belief your analytics insights,

then it’s price contemplating trendy OLAP methods. Relaxation assured, they don’t seem to be out of date and are right here for some time. Fashionable OLAP instruments are actively developed and keep related in 2024. Furthermore, they profit from the most recent advances in:

  • big-data applied sciences,
  • self-service options,
  • generative AI,

to implement new options or full present ones to enhance the productiveness of the top customers. However it is a subject for a future put up. So keep tuned!

The reader can discover the obtainable OLAP servers on this Wikipedia web page.