Ranet OLAP

1. Detecting the flaws

All large organizations managing big amounts of storage data have to regularly optimize databases performance. In this article we’re going to review several methods to enhance OLAP performance particularly. Those methods are aimed to improve the time of the cube build or the query time. As a result, this can drastically boost workload as the operating system will run more precise and rapid. Despite OLAP data cubes are typically noted for being very fast, a user has to ensure that the data is updated and processed under the service level agreement given set quality, availability, and responsibilities. Fist of all a user has to check that the OLAP cube is generally tuned in a proper way:

  • Make sure that the hardware and all corresponding patches are applied.
  • When the storage data is updated and configured appropriately a user has to define the workload required which involves the choice between parallel and serial processing. Parallel processing is considered to be very suitable to enhance performance but this mode will require partitioning which in its turn is unnecessary for certain users.
  • Dimensions and the logic of the system have to be examined, the cubes and the storage model have to be analyzed. It’s needed to check aggregations, data quality, and density.
  • If the whole multidimensional OLAP cube is up to date one has to check the queries scheme and analyze the database.

Although in general there are three major points which can’t be neglected:

  • Appropriate use of hardware resources.
  • Query Performance.
  • Processing Performance

Before optimizing a certain part of the system it’s necessary to detect the one which takes more time than required. The first point to pay attention to is the storage mode related to the hardware resources - the server respectively.

2. Hardware resources optimization

OLAP Hardware resources optimization

While talking about big data storage it’s a must to remember about setting the amount of memory available, choosing the server disk space, enhancing data input and output, etc. Nevertheless, when it has to do with the OLAP client there are several specific ways to optimize the system.

Type of data structure

OLAP storage mode defines the way data is stored and organized in the database thereby it has a huge clout on the requirements of the storage space and the process of data extraction. There are three main storage modes to choose from according to the needs and demands: MOLAP - multidimensional OLAP client, ROLAP - relational OLAP client, and HOLAP - the hybrid of MOLAP and ROLAP. In a MOLAP cube storage fact data and aggregations are maintained on the server in a compact and indexed multidimensional format. In a ROLAP cube storage relational tables are composed to maintain the aggregations in a data warehouse and when there are user’s queries to reply, it calls the warehouse database on the server to retrieve the requisite data. As a hybrid, HOLAP cube maintains aggregations in a multidimensional format on the OLAP server as well as MOLAP, but it calls the warehouse database when it has to respond the MDX queries with the help of the information from the fact table. MOLAP client is considered to be the fastest while responding the queries and building the cubes although there are operations where MOLAP doesn’t suit. These operations are what-if scenarios and real-time OLAP client. In order to carry out what-if analysis or do forecasting, a user has to change the information in the cube under the write-back capability. Due to a cube partition, it will be possible to store all the write-back values in a relational database on the server and thereby they will impact only the write-back data without compromising other cube partitions. That is why in this case HOLAP or ROLAP would be more convenient. The ROLAP client would also be appropriate for a real-time OLAP operation so that the substantive storage data could be modified simultaneously with an MDX query run by a user. This way a user can steadily see fresh data changes.

Standard scheme for the cube

The architecture which underpins all data in the multidimensional cube storage on the server plays the main role in the process of data retrieval and further analysis. Therefore it’s essential to have the OLAP cube built according to all the requirements of other applications used with it. As the majority of tools and applications are intended to identify fact tables in the form of star or snowflake schemes it’s a must to build an OLAP cube based on this type of table in the center. Otherwise, the performance will drastically decline. The very provision of proper and adequate cube architecture on the server will already ensure better performance of the OLAP cube and efficient MDX query processing.

Multiple dimensions

A user should avoid building an OLAP cube with one dimension as this dimension will have to be divided into several smaller dimensions according to logical business entities. Usually, an OLAP cube involves one dimension if it was created under one denormalized table which couldn’t be broken into several dimensions. Typically the single dimension in a cube includes plenty of logical business entities. For the convenience of the user, business entities are better simulated and navigated while belonging to split dimensions.

Separate dimensions

Dimensions of the OLAP cube mustn’t be linked together particularly in case the cube contains outline calculations or scripts. Nevertheless, a user is able to exploit linked dimensions if they belong to another server or database but it can have a toll on performance and efficiency. Rather than linking dimensions, it’s better to generate a dimension copy which can be stored on the same server.

3. Querying Performance optimization

So that to provide the successful performance of the queries, it’s significant to appropriately design dimensions and measure groups, compose desired aggregations, indexes and streamline the MDX queries.

Data aggregations

While building and tuning the cube, the aggregation level is set up in compliance with the required speedup of the queries processing. An OLAP client system assesses the speedup according to the amount of data inputs and outputs which the system requests in order to reply the queries. The fact is that the total number of aggregations in the cube on the server is the sum of members number from each dimension. Therefore an average cube may contain a plenty of aggregations hence calculating all of them beforehand is not reasonable due to filling the storage space of the server and time spent to pre-calculate the aggregations. That’s why usually OLAP Services chooses key aggregations that are disseminated across the cube which already reduces the time spent on defining any aggregation to respond the query. Furthermore, the server can as well cache the aggregations to increase query performance. There’s also an opportunity to identify which exact aggregations end users need and thereafter use usage-based aggregations. In order to determine those aggregations end users’ queries have to be taken into account and analyzed.

Log query patterns

For usage-based optimizations, it is possible to log queries for the next use. By default, every tenth query logs on the server although a user is able to log every query not to lose any of them, as it’s especially important at the beginning of getting used to the OLAP cube. Additionally, the queries can be filtered by owner, date, frequency, duration, etc. After setting the criteria to choose a specified query a user can see a list of queries suiting the criteria and pick the one needed. It’s as well feasible to add or replace queries.

PivotTable Services: predicting query trips to the server

An OLAP client PivotTable Services is one more way to optimize the OLAP performance. Its main feature lies in operates with memory: the client caches all the information about members and cell values. Recording the results of the queries highly boosts query performance as PivotTable Services predicts the values which the application will inquire thereby reducing the number of trips to the server. It works the following way: when the dimension members are set up by the MDX query to occur on the cube axes, PivotTable Services client goes back to the application. The application has to require a cell values group to let the client define the values. In case there are calculated members in the query, PivotTable Services assesses the calculated members when requires the cell values. Besides, when the connection is set up the client downloads all members in the cube’s dimension levels. And if there are dimension levels which include more members than a user determines, PivotTable Services extracts larger-level members solely when such a query is run.

Cube and measure group design

It’s recommended to get rid of redundant attributes relationships to contribute to the query execution engine while creating the query plan for a certain OLAP cube. The relationships of the attributes have to be either indirect or direct but not both together.

4. Processing Performance optimization

In the process of data mining, the OLAP client updates the storage with the fresh data from the server including data warehouses and source systems and creates new aggregations. It likewise generates an attribute store for the dimension attributes and a hierarchy store for natural hierarchies. This procedure makes sure a user will draw up a report with fresh and updated data under the defined service level agreement so that it would comply with the specified quality, availability, and responsibilities.

Cube partition

OLAP Querying Performance optimization

A cube partition is a unit of stored data which can be individually regulated. In every partition created on the server, there is its aggregation level with a storage mode where the data is consistently refreshed. Due to the fact that partitions are separated a user is able to reduce the time of consistent loading as it’s possible to generate a new partition containing refreshed data. A user also can determine the cube slice in each partition which as well enhances query performance given that the slice informs a user about the information which each partition includes. Typically the OLAP client users utilize sales data from the recent year. The cube performance optimization here is related to an opportunity to separate a cube into two partitions where the first one will contain data aggregations of the recent year whereas the second one will store all the rest data aggregations. Having two partitions in the cl a user can determine to which degree they need to optimize aggregations in each partition.

Partition size

It’s not recommended to create multiple small partitions in one measure group as small partitions will have to be reconciled into several larger partitions groups to increase efficiency. Thereby one should combine partitions if there are already more than five of them in a measure group. Additionally, these five partitions are better to contain not more than two million rows.

Cache frequently used datasets

Thanks to cache datasets it is now feasible to store the results on the server when it comes to well-traversed reports. Many users face a problem of returning a huge dataset while loading. Though when calculated members are exploited, the OLAP cube exploits not the whole cache but a session cache.

Hierarchies on large dimensions

Generated hierarchies make it possible to begin with a high dimension level with a couple members and cross the dimension without necessarily loading all the members. A hierarchy operates on the principal of a many-to-one tree structure which means, for instance, that a user crosses about 60 members as opposed to 3660.

5. Inevitable trade-offs

It should be noted that optimizing OLAP performance always involves trade-offs. Here they turn up between query time and cube build time. For example, one may suppose that in order to maximally optimize query performance it’s obligatory to precalculate as many aggregations as possible, nevertheless, it usually leads to a very long cube build procedure. That is why it is worth summing up all pros and cons regarding server disk space, architecture, storage mode etc. to come up with the best solution regarding the performance optimization.

Write US