Current release: COOL_2_5_0


COOL performance optimization

This document describes the current status of COOL performance optimizations in June 2008, after the COOL 2.5.0 release.

Overview

A COOL performance page was first prepared for COOL 1.2.5 in October 2005 and then remained largely unchanged until COOL 2.4.0a in June 2008. With respect to those earlier versions, this document has been almost completely rewritten to reflect the improved understanding of and control over the performance of SQL queries used in COOL.

In particular, while users and DBAs are still advised to gather Oracle data statistics for their COOL databases, this should not be as critical as in previous years, because Oracle hints have been added to COOL queries to stabilise performance even in the absence of reliable statistics.

For more details about COOL performance optimizations, please refer to the three following presentations:


Oracle: optimizing and stabilizing execution plans

Performance optimization for COOL has consistently targeted Oracle, the most important backend for COOL deployment. All that is described in the following applies to this backend alone.

The relational implementation of COOL for the Oracle backend heavily relies on indexes on the relevant tables to make sure that queries are computed using the best "execution plan". However, the presence of indexes by itself is not sufficient to ensure that the best execution plan is used for the processing of each query. In particular, the following three issues are very important:

The combined effect of execution plan persistency and statistics or bind variable peeking leads to the following results. This is all very common in typical situations where tables are filled, queried and analysed for statistics in random order. In summary, the execution plan used for a given SQL query cannot be predicted a priori because it depends on the presence and quality of statistics and on the values of bind variables used at the time the query was first executed (hard-parsed). As a consequence of this instability of execution plans, very different performances may be observed for the same SQL query. In COOL, this issue is addressed by adding Oracle hints to the queries, to make sure that the same (good) plan is used in all cases, even with unreliable statistics or unfavourable bind variables.

Example

The following two plots are meant to illustrate the concepts described above, in the specific use case of IOV retrieval from a COOL MV standard tag. More details can be found in Savannah task #5820.

The relevant SQL query for this use case was improved in the transition between the COOL 2.3.0 and 2.3.1 releases. In addition, Oracle hints were added in release COOL 2.3.1 to stabilise the query execution plan against the effect of statistics and bind variable peeking. In both plots, the red curve on the left shows the poor performance of the COOL 2.3.0 query strategy: query time increases significantly if data valid at higher timestamps is queried. The other curves in each plot are much more interesting, as they correspond to different execution plans for the same SQL query strategy, the one adoped in COOL 2.3.1.

In the plot on the left, six curves are drawn for COOL 2.3.1: these correspond to two values (low or high) of the timestamp bind variable and to three cases for statistics (missing, unreliable, reliable). Out of the six curves, only three show good performance (flat query time), while the other three cases (the two cases with unreliable statistics computed on an empty table, and the case with low bind variable and reliable statistics) show bad performance. All six curves use the COOL 2.3.1 query strategy without Oracle hints.

In the plot on the right, the same six curves are drawn again, after the addition of the appropriate Oracle hints: all six curves now exhibit good performance (flat query time). For reference, one of the curves without hint from the plot on the left is also shown. Colors are different in the two plots.

    

COOL tools to gather Oracle statistics

For convenience, a couple of useful scripts have been included in the COOL CVS repository. Type
coolGatherTableStats.py "<oracle URL>"
to gather statistics on all tables and indices for a given COOL database, or type
coolGatherSchemaStats.py "<oracle URL>"
to gather statistics on all tables and indices defined within the Oracle schema where the given COOL database is created. Both scripts invalidate the execution plans for all queries defined on those tables, forcing them to be hard-parsed again. These scripts are used internally for all performance tests, but in principle they should not be needed any longer by end users.

MySQL and SQLite

In order to simplify code development, the queries optimized for Oracle have been written using an SQL syntax that makes them functionally usable unchanged also on MySQL and SQLite. By default, the same code is used to process all backends.

Performance for the most important use cases has also been studied and tested for MySQL and SQLite. In some cases, simple tests have shown that the queries optimized for Oracle result in very poor performance for the other backends. One such example is the use of complex subqueries in MySQL: in this case, a different SQL strategy is used for this backend alone.


For any questions please contact Andrea Valassi