|
Join strategies
The most common join strategy in Derby is called a nested loop. For each qualifying row in the outer table, Derby uses
the appropriate access path (index or table) to find the matching rows in
the inner table.
Another type of join in Derby is called a hash join. For joins of this type, Derby constructs a hash table representing
all the selected columns of the inner table. For each qualifying row in the
outer table, Derby does a quick lookup on the hash table to get the
inner table data. Derby has to scan the inner table or index only once,
to build the hash table.
Nested loop joins are preferable in most situations.
Hash joins are preferable in situations in which the inner table values
are unique and there are many qualifying rows in the outer table. Hash joins
require that the statement's WHERE clause be an optimizable equijoin:
- It must use the = operator to compare column(s) in the outer table to
column(s) in the inner table.
- References to columns in the inner table must be simple column references.
Simple column references are described in Directly optimizable predicates.
The hash table for a hash join is held in memory and if it gets big enough,
it will spill to the disk. The optimizer makes a very rough
estimate of the amount of memory required to make the hash table. If it estimates
that the amount of memory required would exceed the system-wide limit of memory
use for a table, the optimizer chooses a nested loop join instead.
If memory use is not a problem for your environment, set this property
to a high number; allowing the optimizer the maximum flexibility in considering
a join strategy queries involving large queries leads to better performance.
It can also be set to smaller values for more limited environments.
Note: Derby allows multiple columns as hash keys.
|