directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Lécharny <elecha...@gmail.com>
Subject Re: Question about JDBM key/value replacement
Date Fri, 27 Apr 2012 16:08:17 GMT
Le 4/27/12 5:46 PM, Selcuk AYA a écrit :
> On Fri, Apr 27, 2012 at 6:06 AM, Emmanuel Lécharny<elecharny@gmail.com>  wrote:
>> Le 4/27/12 2:09 PM, Selcuk AYA a écrit :
>>
>>> Hi,
>>> On Thu, Apr 26, 2012 at 4:52 PM, Emmanuel Lécharny<elecharny@gmail.com>
>>>   wrote:
>>>> Hi guys,
>>>>
>>>> currently, we can replace an existing value when its associated key
>>>> exists
>>>> in the BTree. What I'd like to do is to replace both key and value in
>>>> this
>>>> case.
>>>>
>>>> The rational is that the key mau contain some extra information that are
>>>> not
>>>> used to find the key, like the nbChildren/nbDescendant for a
>>>> ParentIdAndrdn
>>>> key in the RdnIndex.
>>>>
>>>> Is there an easy way to do that with JDBM, or should we extend the API it
>>>> offers ?
>>> as I understand you are trying to store some auxiliary information per
>>> key?
>> Yep.
>>
>>
>>>   With JDBM we can:
>>>    - provide<K, aux>    tuple as the key and pass the key comparator to
>>> be the comparator for the actual keys.
>>>    - change JDBM code to update key inside when replacing an entry.
>>>
>>> however, in general, I do not think we can expect a key/value store to
>>> update the key when replacing an entry. So I am not sure it is a good
>>> idea to do this kind of thing.
>>
>> The pb is that those auxiliary informations are not part of the compare
>> being done. Ie, two keys may be seen as equals, but the auxiliary
>> information might be different. In this very case, I'd like the new version
>> of the key replaced.
>> It may be very specific to my need, and we can also consider that the value
>> might cary those extra informations, too, instead of storing them in the
>> key.
>>
>> So for the<ParentIdAndRdn, ID>  tuple, where the data structure is :
>> ParentIdAndRdn :
>> {
>>     ID parentId,
>>     Rdn[] rdns,
>>     int nbChildren,
>>     int nbDescendant
>> }
>>
>> we could inject the two ints after having read the value if this value
>> structure was :
>>
>> AugmentedID
>> {
>>     ID id
>>     int nbChildren,
>>     int nbDescendant
>> }
> I think this would be better. However, note that in txns branch, we
> removed most of the generics and assumed that indices store UUID as
> their value. I hope we go  with removal/add ( I guess this is what you
> are doing now)
Should not be a big problem here. Generics are an issue anyway...
>
>> and serializing the ParentIdAndRdn ID and Rdn[] only.
> I am kind of confused, I have a couple of questions:
>
> 1) Can you explain why we need to store nbChildren and nbDescedant but
> do not need to serialize it?
We *do* serialize those elements. What I suggest is that we store those 
information in the values, not in the keys, as in any case, we will get 
a tuple <key, value>, so we will always have the extended informations.

> What if the page storing an entry is
> kicked out of cache and we loose nbChildren and nbDescendant?
Then we will read them from disk, if the cache does not contain those 
information, right ?

>
> 2)Connected with the above question, how do we maintain these
> auxiliary data? Are there maintained only in memory or on disk as
> well? Do they have to absolutely correct to get consistent data out of
> the server?
On memory and on disk. They are absolutely critical as they are used to 
browe the tree while doing a One/Subtree scope search.

>
> 3) Lets say I have<rdn1, rdn2,rdn3>  and<rdn1,rdn2, rdn4>  . Thread 1
> adds<rdn1, rdn2,rdn3, rdn5>  and thread 2 add<rdn1,rdn2, rdn4,rdn6>
> concurrently.
> These two threads run concurrently but they should not
> logically conflict. Now they have to read nbDescendants of<rdn1,rdn2>
> and increment it but they have to do it in such a way that the net
> effect of these two threads is serialized.
yep.

> I am not sure how to do
> this.
do you mean that the indexes aren't gloablly protected when we modify 
some data ? I thought that a modification will protect all the index it 
will update until they have all been updated, then the next modification 
can operate.

We don't really care if that serialize the modifications, because the 
server does not have to be fast when we inject data, only when we read 
data. At least, we could think about a serialization over the operations 
on one index like the RdnIndex (as an entry modification may update more 
than one entry in this index).

Is that a problem ?

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com


Mime
View raw message