accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Re: Reverse Index Timestamp
Date Tue, 27 Nov 2012 17:57:25 GMT
On Tue, Nov 27, 2012 at 12:21 PM, Roshan Punnoose <roshanp@gmail.com> wrote:

> The <string> would most likely be a fixed set of strings that do not
> change over time.
>
> My question is if it is bad to use a reverse index timestamp in the row
> id? Will it cause problems with the tablet splitting, compaction, and
> performance if the data is always being sent to the top of the tablet? If I
> define a split as everything prefixed with <string>, then the ingest will
> go to one tablet, but then I add a reverse timestamp in the row, and that
> would mean I am always copying data to the top of the tablet. Will this
> cause performance issues? Or is it better to append to a tablet?
>

I do not think it should matter. Inserts go into a C++ STL map on the
tablet server if using the nativemap.   I think the implementation of that
is a balanced binary tree.  So I do not think inserting at the beginning vs
the end would make difference.  That being said, I do not think I have
tried this so I do not know if there would be any suprises.  I would be
interested in hearing about your experiences.


>
>
> On Tue, Nov 27, 2012 at 11:51 AM, Keith Turner <keith@deenlo.com> wrote:
>
>>
>>
>> Keith
>>
>> On Tue, Nov 27, 2012 at 10:41 AM, Roshan Punnoose <roshanp@gmail.com>wrote:
>>
>>> I want to have a table where the row will consist of "<string>-<reverse
>>> index timestamp>". But this means that the data is always being prefixed to
>>> the beginning of the row (or tablet if the row is large). Will this be a
>>> problem for compaction or performance?
>>
>>
>> Can you tell me more about what <string> is?  For example is it a hash or
>> does it come from the set "foo1","foo2","foo3".   How does it change over
>> time?  I think the answer to your question depends on what <string> is.
>>
>>
>>>
>>> I don't know if I heard this correctly, but someone once mentioned that
>>> making the row id the direct timestamp could cause performance issues
>>> because data is always going to one tablet, but also because there is
>>> trouble splitting since it always appends to the tablet. Is this true, is
>>> it similar to what could happen if I am always prefixing to a tablet?
>>>
>>
>> Yes using a timestamp for a row could cause data from many clients to
>> always go to the same tablet, which would be bad for performance on a
>> cluster.
>>
>>
>>>
>>> Thanks!
>>> Roshan
>>>
>>
>>
>

Mime
View raw message