directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Lecharny <elecha...@gmail.com>
Subject Re: Optimizing DB access with improving key usage
Date Sun, 23 Jul 2006 23:32:58 GMT
Hi guys,

I have done some tests lately, with the latest version of optimization 
branch, and different JVMs.

Here are the results I got :

Sun JVM 5.0_06, with a client sending 10000 search requests, through 8 
threads : max is 640 req/s
IBM JVM 5.0, with a client sending 10000 search requests, through 5 
threads : max is 853 req/s
Jrockit 5.0 JVM, with a client sending 10000 search requests, through 5 
threads : max is 895 req/s

Each loop of 10000 search is done 12 times, and the 2 worst values are 
removed.

The interesting point is that JRockit does not give immediate good 
results, it seems to optimize the code dynamicaly (I had to run the 10K 
search 6 to 8 times before obtaining stablke results)

The configuration is the one Alex set for indexes, so I started the 
server with 1Gb memory. It does not seems to be necessary, 256 Mb should 
be enough.

There are still a *lot* of readObject() calls, and I guess it is due to 
index that are read (data are serialized internally). As the search I'm 
doing is a present search (with a full DN selected randomly), the ndn 
index is used, and the DN is deserialized, but the StringSerializer is 
not used at all. It will be very interesting to add a StringSerializer 
to the ndn_forward index, because looking for a DN means deserializing 
around 14 DNs from this index (as we have 10K entries, the b-tree height 
is 14)

We also need a TreeSet serializer/deserializer for other index, but this 
is a little bit complicated to implement (my knowledge about the way 
index are initialized is a little bit light, right now :)

The good news are that with a decent JVM, we are now closer to OpenLdap 
and FDS : they are performing this very same search at a rate which is 
1,75 times faster (around 1450 req/s). Replacing those readObject calls 
by our own StringSerialization will be a good improvment (I think we can 
reach aat least 1000 req/s)

My comments for Ersin's mail  :

Alex Karasulu a écrit :

> Ersin Er wrote:
>
>> [RESENDING]
>>
>> Hi all,
>>
>> The optimization branch is busy with extensive refactorings and things
>> are going well. We had a few convos recently for further improving the
>> performance and one of the points of improvement was thought as
>> replacing the BigInteger construct which we use as the key of Attributes
>> objects in the database.
>
>
> It will certainly save us memory and processing time as we discussed
> because it would only take one lookup to retrieve the master entry.

Agreed.

>
>> What we do when retrieving and Attributes object is first retrieving a
>> BigInteger stored in the db for the normalized DN and then doing one
>> more lookup for the actual Attributes object using the BigInteger as the
>> key.
>
>
> This is done for two reasons. I chose a integer value as the primary 
> key over the DN itself because
>
> (1) [Big]Integers need no normalization,
> (2) [Big]Integers take mostly 4 bytes to encode for storage as keys 
> and values so matching them up is easy to do without loosing sorting 
> order.  So imagine having to store several copies of DN's whose 
> prefixes overlap to a large degree.  That means potentially millions 
> more byte comparisons to do to match a key across indices.

Agreed. This is good optimization. DN Comparizons are costly, even if 
they are already normalized. It cost a lot of CPU to compare two Strings 
when a simple instruction is enough to compare two ints.

>
>> The first improvement to be done is to replace BigInteger with int or
>> long which cost much less. 
>
>
> This will not have as much of an impact as you think.  JDBM expects 
> objects at it's interfaces so we'll still have to create at least a 
> byte[] from the int or long.

Well, here, I do think that the impact will be important, but for a 
different reason. Serializing and deserializing a BigInteger is really 
costly, but just because we use the standard java serialization. If we 
switch to our own serialization/deserialization, then the improvment 
will be huge. And Integer is just a wrapper around an int, while 
BigInteger initialize at least an array of int. I don't say that we will 
have a server that will be 2 times faster, however :) Let say a few 
percents, may be...

>
>
> But as Alex just (yes, at the time of writing
>
>> this :) ) warned me about that jdbm keys should be objects not
>> primitives. 
>
>
> :( oops you already got that.
>
> So my second suggestion is again replacing the BigInteger
>
>> with more leightweight class like Integer and generating that Integer
>> for each LdapDN at the time of construction. 
>
>
> Well BigInteger is not that heavy.  In most cases for us we generate 4 
> or less bytes for a BigInteger.  I have to see just how heavy the 
> operations are for generating this byte[] representation but I think 
> it would be the same to do for an int at the low capacities we're 
> dealing with and expecting.

I think we can delay this modification, but insetad spend some time on 
BigIntegerSerialization usage.

>
> I think we do not need to
>
>> do one more lookup to retrieve the (Big)Integer key of the Attributes
>> object. We can just generate a hash value at LdapDN constructor and
>> provide a method for accessing it when we want to store or retrieve
>> Attributes objects. 
>
>
> This is a problem.  Because for #2 above indices return integer ids. 
> While searching we may do a lookup on an index and not have the DN 
> available.

And storing a hash value does not help a lot. Two objects may have the 
same hashValue (even if they are different), so you will need to fetch 
those two objects. If you work with this BigInteger/Integer, then the 
object you point will be unique. As deserialization of an entry is a 
very costly operation, this lookup may be faster.

>
> ...
>
>> WDYT ?
>
>
> I think this is a good try.  Really I do so please don't get 
> discouraged.  But most LDAP server backends based on BTree 
> implementations use this scheme for the PK.  We're not inventing 
> anything new by the way we do things in the jdbm backend.
>
> Now recently you had a great idea for making the master table a hash 
> instead of a btree.  This is a great idea that should be done IMO.

Yeah, definitively. For each object that is supposed to be unique, like 
DN, or uid, we may use a hash. This would be mandatory for DN, and 
optionnal (and parametrized in the configuration) for uid, or any other 
attribute. If the distribution of values is large (let say that 80% of 
the values are different, and for the other20%, we have few collisions), 
we can use a hash. This is up to the Ldap admin.

Emmanuel.

Mime
View raw message