Current release: COOL_2_5_0
COOL performance optimization
This document describes the current status
of COOL performance optimizations
in June 2008, after the COOL 2.5.0 release.
A COOL performance page was first prepared for COOL 1.2.5 in October 2005
and then remained largely unchanged until COOL 2.4.0a in June 2008.
With respect to those earlier versions,
this document has been almost completely rewritten
to reflect the improved understanding of and control over
the performance of SQL queries used in COOL.
In particular, while users and DBAs are still advised
to gather Oracle data statistics for their COOL databases,
this should not be as critical as in previous years,
because Oracle hints have been added to COOL queries
to stabilise performance even in the absence of reliable statistics.
For more details about COOL performance optimizations,
please refer to the three following presentations:
Oracle: optimizing and stabilizing execution plans
Performance optimization for COOL has consistently targeted Oracle,
the most important backend for COOL deployment.
All that is described in the following applies to this backend alone.
The relational implementation of COOL for the Oracle backend
heavily relies on indexes on the relevant tables to make sure
that queries are computed using the best "execution plan".
However, the presence of indexes by itself is not sufficient to ensure
that the best execution plan is used for the processing of each query.
In particular, the following three issues are very important:
The combined effect of execution plan persistency and statistics
or bind variable peeking leads to the following results.
This is all very common in typical situations where tables
are filled, queried and analysed for statistics in random order.
- Effect of statistics.
When the Oracle query optimizer chooses the execution plan for a query,
it takes into account the distribution of data in the relevant tables,
as described in the histograms created internally
when "statistics are gathered" on the tables.
If statistics are missing or outdated, the optimizer may choose
an execution plan that is not optimal for the current table contents
(e.g. a full table scan instead of an index range scan).
- Bind variable peeking.
If an SQL query contains bind variables,
the optimal execution plan may depend on the values of those variables.
When the Oracle query optimizer chooses the execution plan for such a query,
it may indeed look at the values of all bind variables involved:
this is known as "bind variable peeking".
- Execution plan persistency.
The execution plan for an SQL query is computed
the first time the query is executed ("hard-parsed").
After execution, the query remains stored in the ("shared pool") memory
of the database server, together with the execution plan that was used for it.
Unless the query is flushed out of memory
(either intentionally or stochastically),
the same execution plan is going to be used again
when the query is next executed.
In summary, the execution plan used for a given SQL query
cannot be predicted a priori because it depends
on the presence and quality of statistics
and on the values of bind variables used at the time
the query was first executed (hard-parsed).
As a consequence of this instability of execution plans,
very different performances may be observed for the same SQL query.
In COOL, this issue is addressed by adding Oracle hints to the queries,
to make sure that the same (good) plan is used in all cases,
even with unreliable statistics or unfavourable bind variables.
- The execution plan may be computed for the first time
when statistics are missing, or when the existing statistics are unreliable
because they were gathered on a table with few or no rows at all.
If the same execution plan is then reused to query a table
with many rows, performance may be very poor.
- COOL tables contain historical data with a timestamp.
A typical COOL query fetches data valid at a given timestamp,
represented in SQL by a bind variable.
The execution plan for the query may be computed for the first time
for values of the timestamp which are close to the minimum in the table.
If the same execution plan is then reused to query the table
for much larger values of the timestamp, performance may be very poor.
The following two plots are meant to illustrate
the concepts described above, in the specific use case
of IOV retrieval from a COOL MV standard tag.
More details can be found in Savannah task #5820.
The relevant SQL query for this use case was improved
in the transition between the COOL 2.3.0 and 2.3.1 releases.
In addition, Oracle hints were added in release COOL 2.3.1
to stabilise the query execution plan
against the effect of statistics and bind variable peeking.
In both plots, the red curve on the left shows the poor performance
of the COOL 2.3.0 query strategy: query time increases significantly
if data valid at higher timestamps is queried.
The other curves in each plot are much more interesting,
as they correspond to different execution plans
for the same SQL query strategy, the one adoped in COOL 2.3.1.
In the plot on the left, six curves are drawn for COOL 2.3.1:
these correspond to two values (low or high) of the timestamp bind variable
and to three cases for statistics (missing, unreliable, reliable).
Out of the six curves, only three show good performance (flat query time),
while the other three cases (the two cases with unreliable statistics computed
on an empty table, and the case with low bind variable and reliable statistics)
show bad performance. All six curves use the COOL 2.3.1 query strategy
without Oracle hints.
In the plot on the right, the same six curves are drawn again,
after the addition of the appropriate Oracle hints:
all six curves now exhibit good performance (flat query time).
For reference, one of the curves without hint from the plot on the left
is also shown. Colors are different in the two plots.
COOL tools to gather Oracle statistics
For convenience, a couple of useful scripts have been included
in the COOL CVS repository.
coolGatherTableStats.py "<oracle URL>"
to gather statistics on all tables and indices for a given COOL database,
coolGatherSchemaStats.py "<oracle URL>"
to gather statistics on all tables and indices
defined within the Oracle schema where the given COOL database is created.
Both scripts invalidate the execution plans for all queries
defined on those tables, forcing them to be hard-parsed again.
These scripts are used internally for all performance tests,
but in principle they should not be needed any longer by end users.
MySQL and SQLite
In order to simplify code development,
the queries optimized for Oracle have been written using an SQL syntax
that makes them functionally usable unchanged also on MySQL and SQLite.
By default, the same code is used to process all backends.
Performance for the most important use cases
has also been studied and tested for MySQL and SQLite.
In some cases, simple tests have shown that the queries optimized for Oracle
result in very poor performance for the other backends.
One such example is the use of complex subqueries in MySQL:
in this case, a different SQL strategy is used for this backend alone.
For any questions please contact