accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roshan Punnoose <rosh...@gmail.com>
Subject Re: Reverse Index Timestamp
Date Tue, 27 Nov 2012 22:53:41 GMT
Thanks Jim, do you mean the least significant bits of the timestamp?


On Tue, Nov 27, 2012 at 4:45 PM, Jim Klucar <klucar@gmail.com> wrote:

> Roshan,
>
> Depending on what your cluster setup is and what the resolution of the
> time stamp is you could do something like this to spread the data around:
>
> <timestamp-LSBs>-<string>-<reverse timestamp>
>
> Using the LSBs of the timestamp as a uniform hash, then splitting on all
> possible hashes would spread things around a bit. If you do this, then all
> scans must check all hashes for data.
>
>
>
>
> On Tue, Nov 27, 2012 at 1:25 PM, Keith Turner <keith@deenlo.com> wrote:
>
>>
>>
>> On Tue, Nov 27, 2012 at 1:22 PM, Roshan Punnoose <roshanp@gmail.com>wrote:
>>
>>> Thanks!
>>>
>>> The fact that you are using a binary tree behind the scenes makes
>>> perfect sense. Btw, what do you use in the standalone (non native)
>>> implementation? Does it use a TreeMap?
>>>
>>
>> When not using native code, ConcurrentSkipListMap is used.
>>
>>
>>>
>>>
>>> On Tue, Nov 27, 2012 at 12:57 PM, Keith Turner <keith@deenlo.com> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Nov 27, 2012 at 12:21 PM, Roshan Punnoose <roshanp@gmail.com>wrote:
>>>>
>>>>> The <string> would most likely be a fixed set of strings that do
not
>>>>> change over time.
>>>>>
>>>>> My question is if it is bad to use a reverse index timestamp in the
>>>>> row id? Will it cause problems with the tablet splitting, compaction,
and
>>>>> performance if the data is always being sent to the top of the tablet?
If I
>>>>> define a split as everything prefixed with <string>, then the ingest
will
>>>>> go to one tablet, but then I add a reverse timestamp in the row, and
that
>>>>> would mean I am always copying data to the top of the tablet. Will this
>>>>> cause performance issues? Or is it better to append to a tablet?
>>>>>
>>>>
>>>> I do not think it should matter. Inserts go into a C++ STL map on the
>>>> tablet server if using the nativemap.   I think the implementation of that
>>>> is a balanced binary tree.  So I do not think inserting at the beginning
vs
>>>> the end would make difference.  That being said, I do not think I have
>>>> tried this so I do not know if there would be any suprises.  I would be
>>>> interested in hearing about your experiences.
>>>>
>>>>
>>>>>
>>>>>
>>>>> On Tue, Nov 27, 2012 at 11:51 AM, Keith Turner <keith@deenlo.com>wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> Keith
>>>>>>
>>>>>> On Tue, Nov 27, 2012 at 10:41 AM, Roshan Punnoose <roshanp@gmail.com>wrote:
>>>>>>
>>>>>>> I want to have a table where the row will consist of
>>>>>>> "<string>-<reverse index timestamp>". But this means
that the data is
>>>>>>> always being prefixed to the beginning of the row (or tablet
if the row is
>>>>>>> large). Will this be a problem for compaction or performance?
>>>>>>
>>>>>>
>>>>>> Can you tell me more about what <string> is?  For example is
it a
>>>>>> hash or does it come from the set "foo1","foo2","foo3".   How does
it
>>>>>> change over time?  I think the answer to your question depends on
what
>>>>>> <string> is.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> I don't know if I heard this correctly, but someone once mentioned
>>>>>>> that making the row id the direct timestamp could cause performance
issues
>>>>>>> because data is always going to one tablet, but also because
there is
>>>>>>> trouble splitting since it always appends to the tablet. Is this
true, is
>>>>>>> it similar to what could happen if I am always prefixing to a
tablet?
>>>>>>>
>>>>>>
>>>>>> Yes using a timestamp for a row could cause data from many clients
to
>>>>>> always go to the same tablet, which would be bad for performance
on a
>>>>>> cluster.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Thanks!
>>>>>>> Roshan
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message