db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fedd Kraft <f...@sql.ru>
Subject Re: Derby internally converts char for bit data to hex string
Date Mon, 05 Apr 2010 19:39:44 GMT
31.03.2010 13:56, Knut Anders Hatlen пишет:
> On 03/31/10 08:26 AM, Fedd Kraft wrote:
>    
>> Hello,
>>
>> When profiling an application I've noticed that sometimes when performing a
>> simple query to one table without joins, like
>>
>> select kee from res where qid=? and kee=?
>>
>> it looks like Derby searching for a row makes a lot of comparisons by
>> converting binary data to hex strings and then comparing them.
>>
>> And (here I am not sure) chooses to do a full scan. As the result, the whole
>> thing works too slow.
>>
>> The table is defined like this.
>>
>> create table res(
>>   qid char (16) for bit data not null,
>>   kee char (32) for bit data not null,
>>   dnk char (32) for bit data not null
>> );
>> create unique index res_unq on res (qid, kee, dnk);
>> create index res_idx on res (qid, dnk);
>> create index res_ix2 on res (qid, kee);
>>
>> It also looks like it does this hex string comparison when the table is
>> quite big. And when it is quite small, call normal byte array comparison,
>> and there is no 'fullscan' classes in the stack trace, and it works
>> ultrafast.
>>
>> Sorry there no reproducing scripts yet ready; please tell me may be I just
>> missing something. I thought that converting makes sense when we compare
>> values of different types or which may be null, but mine is not that case
>> (?)
>>
>>      
> Hi Fyodor,
>
> Comparisons of binary data are supposed be performed byte-wise, without
> converting the data to strings first, so if you can provide scripts or
> code to reproduce this, please file a bug report here so that it can be
> investigated:https://issues.apache.org/jira/browse/DERBY
>
> You may first want to check that you're not using a debug build of
> Derby, though, since it might be that the debug builds convert byte
> arrays to hex strings to do some sanity checks. (If derby.jar contains a
> class called SanityManager, it's a debug build.)
>
>    
Hi Knut,

unfortunately I cannot yet reproduce this without my bigger project. My 
version in use has no SanityManager, and if I'd want to see which method 
Derby uses for the comparison, I'd have to use the debug version... 
Anyway, looking at the code, I see (db-derby-10.5.3.0-src.zip) that in 
the method org.apache.derby.iapi.types.SQLBinary.compare(int op, 
DataValueDescriptor other, boolean orderedNulls, boolean unknownRV) is 
called with orderedNulls parameter set to false, it always creates hex 
strings, see:

public final boolean compare(int op,
DataValueDescriptor other,
boolean orderedNulls,
boolean unknownRV)
throws StandardException
{
if (!orderedNulls) // nulls are unordered
{
// sanity manager stuff skipped
String otherString = other.getString(); // <==== line 574 HERE IS HEX 
STRING CREATION
if (this.getString() == null || otherString == null) // <=== AND HERE
return unknownRV;
}
return super.compare(op, other, orderedNulls, unknownRV);
}

The hex string is created through StringBuffer tens of thounsands times 
according to profiler; (and the debugger stopped on line 574, when used 
a debug build). Well, I was wrong that hex strings are compared; they 
are prepared only to know whether any of the values is null.


There is also a query plan problem with this table, and also when the 
table grows big:

create table res(
qid char (16) for bit data not null,
kee char (32) for bit data not null,
dnk char (32) for bit data not null
);
create unique index res_unq on res (qid, kee, dnk);
create index res_idx on res (qid, dnk);
create index res_ix2 on res (qid, kee);

The query is

select kee from res where dnk=? and qid=?

And here are two plans, one for big "res" table, and another for small. 
The first plan shows that the wrong index was chosen, and we read too 
many records to find the result. The table is populated right before it 
gets queried; Maybe there is some statistic stuff, but I would like to 
avoid running some Derby-specific calls in my application...

Thank you!


=================================


2010-04-05 18:27:29.062 GMT 
Thread[mety.store-("zipcode",("city",(*VALUE*,(&5022740b-a32a-4d4a-8a38-511811e32dc5)))),5,m
ain] (XID = 1364486), (SESSIONID = 5), select kee from res where dnk=? 
and qid=? ******* Project-Restrict ResultSet (2):
Number of opens = 34
Rows seen = 34
Rows filtered = 0
restriction = false
projection = true
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
restriction time (milliseconds) = 0
projection time (milliseconds) = 0
optimizer estimated row count: 1488,65
optimizer estimated cost: 12706,02

Source result set:
Index Scan ResultSet for RES using index RES_UNQ at read committed 
isolation level using instantaneous share row
locking chosen by the optimizer
Number of opens = 34
Rows seen = 34
Rows filtered = 0
Fetch Size = 16
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
next time in milliseconds/row = 0

scan information:
Bit set of columns fetched={0, 1, 2}
Number of columns fetched=3
Number of deleted rows visited=0
Number of pages visited=1711
Number of rows qualified=1
Number of rows visited=43192
Scan type=btree
Tree height=4
start position:
 >= on first 1 column(s).
Ordered null semantics on the following columns:

stop position:
 > on first 1 column(s).
Ordered null semantics on the following columns:

qualifiers:
Column[0][0] Id: 2
Operator: =
Ordered nulls: false
Unknown return value: false
Negate comparison result: false

optimizer estimated row count: 1488,65
optimizer estimated cost: 12706,02




==========================================================================



2010-04-05 18:54:19.171 GMT 
Thread[mety.store-("zipcode",("city",(*VALUE*,(&42d96ab6-4dc3-45d0-87d9-2cdea975db97)))),5,m
ain] (XID = 33823), (SESSIONID = 5), select kee from res where dnk=? and 
qid=? ******* Project-Restrict ResultSet (3):
Number of opens = 214
Rows seen = 274
Rows filtered = 0
restriction = false
projection = true
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
restriction time (milliseconds) = 0
projection time (milliseconds) = 0
optimizer estimated row count: 28,53
optimizer estimated cost: 277,09

Source result set:
Index Row to Base Row ResultSet for RES:
Number of opens = 214
Rows seen = 274
Columns accessed from heap = {0, 1, 2}
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
optimizer estimated row count: 28,53
optimizer estimated cost: 277,09

Index Scan ResultSet for RES using index RES_IDX at read committed 
isolation level using instantaneous s
hare row locking chosen by the optimizer
Number of opens = 214
Rows seen = 274
Rows filtered = 0
Fetch Size = 16
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
next time in milliseconds/row = 0

scan information:
Bit set of columns fetched=All
Number of columns fetched=3
Number of deleted rows visited=0
Number of pages visited=4
Number of rows qualified=1
Number of rows visited=2
Scan type=btree
Tree height=3
start position:
 >= on first 2 column(s).
Ordered null semantics on the following columns:

stop position:
 > on first 2 column(s).
Ordered null semantics on the following columns:

qualifiers:
None
optimizer estimated row count: 28,53
optimizer estimated cost: 277,09



Mime
View raw message