directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Lécharny <elecha...@gmail.com>
Subject Re: Mavibot vs JDBM results
Date Tue, 12 Nov 2013 11:04:29 GMT
Le 11/11/13 9:49 PM, Alex Karasulu a écrit :
> On Mon, Nov 11, 2013 at 10:46 PM, Howard Chu <hyc@symas.com> wrote:
>
>> Alex Karasulu wrote:
>>
>>>     Now, this is an approach where we used plain Keys (ie, Keys can have
>>>     various sizes, which is not really efficient, as we may have to
>>>     allocate more pages than necessary to store nodes and leaves.
>>>     Open LDAP uses another approach, which is smarter : they use the hash
>>>     value of each key to retrieve the element. Obviously, this leads to
>>>     compare the keys when we reach the leaf, as we may have more than one
>>>     key with the same hash value, and it also destroys the ordering (one
>>>     can't compare two hash values as the ordering will be different) but
>>>     most of the case, it's not really a big deal.
>>>     The main advantage of such an approach is that suddenly, Nodes have a
>>>     fixed size (a hash can be stored as an int, and the references to a
>>>     page are longs), so in a fixed page size, we can store a fixed number
>>>     of elements. Assuming that a node needs at least 28 bytes to store its
>>>     header and PageIO, in a 512 bytes page we can store (512 - 28) /
>>>     ((nbValues+1) x (8+8) + nbKeys x 4 ) elements, so 16 keys (64 bytes)
>>>     and 17 values (272 bytes). We hve 148 bytes remaining in this case.
>>>     Atm, we store 16 element per node, which requires many physical pages,
>>>     ie, many disk access.
>>>
>>>     This is something that worth being investigated in the near future.
>>>
>>>
>>> Sounds like we need a minimal perfect order preserving hash function.
>>>
>> These are a pain to try to use for this purpose. All of the perfect order
>> preserving hashes I've found only work because you generate the hash
>> function based on knowing the entire set of data in advance. When you add
>> new records you need to create a new hash function, and thus the hash
>> values of every key changed.
>>
>> I.e., these are only useful for static data sets.
>>
> That's lame ... no silver bullets I guess.
>
Well, still : we rarely need ordered index. Only when some filters with
<= or >= those ordered index are good to have. One good tradeoff might
be to create special indexes in those special cases (ie, not always).
Most of the time, we will be good with standard index with hashed keys.

We just need to ask for those specific index to be created in the
configuration.

That is at least something to consider.

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com 


Mime
View raw message