Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9CFEAD2F6 for ; Tue, 27 Nov 2012 22:54:09 +0000 (UTC) Received: (qmail 91110 invoked by uid 500); 27 Nov 2012 22:54:09 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 91081 invoked by uid 500); 27 Nov 2012 22:54:09 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 91072 invoked by uid 99); 27 Nov 2012 22:54:09 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Nov 2012 22:54:09 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of roshanp@gmail.com designates 209.85.219.41 as permitted sender) Received: from [209.85.219.41] (HELO mail-oa0-f41.google.com) (209.85.219.41) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Nov 2012 22:54:02 +0000 Received: by mail-oa0-f41.google.com with SMTP id k14so14901726oag.0 for ; Tue, 27 Nov 2012 14:53:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=T6gZhS8snSHAW72iqFSsWfhQ27eQFUSCzNC3znJrNe4=; b=myw92GOtlnGc/BzwGxGX4PGtw2RaZ6cqjBvDl3e2tSz0jp1BK5O/Gmwc/JTwfkomp0 FQkiuQ/Dow8VTX4AgtNDgZ7sX9aAjf/aPnTnPy2EyzPt6EgwKkAQwNYjaczTIIlWi3kS DHcHqcnMv4X2PFkRmH1Iw3+d2c2wN2o0PAtI26NOFC7VoKuhDgMd0uHvgotNgkJVIMxJ NoWG75NikxwznrJtlGqKp9/o73fzhut9ZvfX0FAECazjLkF3Og2Nc/O7PGXWwnhGwDs1 EPVkWfj8K2DVZgMYw5bnmV/vneTtJtiMuCEf/EBPTmSfQlDbzTcAVwM9ye7fvSdvkqjW e8qg== MIME-Version: 1.0 Received: by 10.60.171.201 with SMTP id aw9mr14162411oec.126.1354056821653; Tue, 27 Nov 2012 14:53:41 -0800 (PST) Received: by 10.76.125.7 with HTTP; Tue, 27 Nov 2012 14:53:41 -0800 (PST) In-Reply-To: References: Date: Tue, 27 Nov 2012 17:53:41 -0500 Message-ID: Subject: Re: Reverse Index Timestamp From: Roshan Punnoose To: user@accumulo.apache.org Content-Type: multipart/alternative; boundary=bcaec550a988dc404f04cf81eb1b X-Virus-Checked: Checked by ClamAV on apache.org --bcaec550a988dc404f04cf81eb1b Content-Type: text/plain; charset=ISO-8859-1 Thanks Jim, do you mean the least significant bits of the timestamp? On Tue, Nov 27, 2012 at 4:45 PM, Jim Klucar wrote: > Roshan, > > Depending on what your cluster setup is and what the resolution of the > time stamp is you could do something like this to spread the data around: > > -- > > Using the LSBs of the timestamp as a uniform hash, then splitting on all > possible hashes would spread things around a bit. If you do this, then all > scans must check all hashes for data. > > > > > On Tue, Nov 27, 2012 at 1:25 PM, Keith Turner wrote: > >> >> >> On Tue, Nov 27, 2012 at 1:22 PM, Roshan Punnoose wrote: >> >>> Thanks! >>> >>> The fact that you are using a binary tree behind the scenes makes >>> perfect sense. Btw, what do you use in the standalone (non native) >>> implementation? Does it use a TreeMap? >>> >> >> When not using native code, ConcurrentSkipListMap is used. >> >> >>> >>> >>> On Tue, Nov 27, 2012 at 12:57 PM, Keith Turner wrote: >>> >>>> >>>> >>>> On Tue, Nov 27, 2012 at 12:21 PM, Roshan Punnoose wrote: >>>> >>>>> The would most likely be a fixed set of strings that do not >>>>> change over time. >>>>> >>>>> My question is if it is bad to use a reverse index timestamp in the >>>>> row id? Will it cause problems with the tablet splitting, compaction, and >>>>> performance if the data is always being sent to the top of the tablet? If I >>>>> define a split as everything prefixed with , then the ingest will >>>>> go to one tablet, but then I add a reverse timestamp in the row, and that >>>>> would mean I am always copying data to the top of the tablet. Will this >>>>> cause performance issues? Or is it better to append to a tablet? >>>>> >>>> >>>> I do not think it should matter. Inserts go into a C++ STL map on the >>>> tablet server if using the nativemap. I think the implementation of that >>>> is a balanced binary tree. So I do not think inserting at the beginning vs >>>> the end would make difference. That being said, I do not think I have >>>> tried this so I do not know if there would be any suprises. I would be >>>> interested in hearing about your experiences. >>>> >>>> >>>>> >>>>> >>>>> On Tue, Nov 27, 2012 at 11:51 AM, Keith Turner wrote: >>>>> >>>>>> >>>>>> >>>>>> Keith >>>>>> >>>>>> On Tue, Nov 27, 2012 at 10:41 AM, Roshan Punnoose wrote: >>>>>> >>>>>>> I want to have a table where the row will consist of >>>>>>> "-". But this means that the data is >>>>>>> always being prefixed to the beginning of the row (or tablet if the row is >>>>>>> large). Will this be a problem for compaction or performance? >>>>>> >>>>>> >>>>>> Can you tell me more about what is? For example is it a >>>>>> hash or does it come from the set "foo1","foo2","foo3". How does it >>>>>> change over time? I think the answer to your question depends on what >>>>>> is. >>>>>> >>>>>> >>>>>>> >>>>>>> I don't know if I heard this correctly, but someone once mentioned >>>>>>> that making the row id the direct timestamp could cause performance issues >>>>>>> because data is always going to one tablet, but also because there is >>>>>>> trouble splitting since it always appends to the tablet. Is this true, is >>>>>>> it similar to what could happen if I am always prefixing to a tablet? >>>>>>> >>>>>> >>>>>> Yes using a timestamp for a row could cause data from many clients to >>>>>> always go to the same tablet, which would be bad for performance on a >>>>>> cluster. >>>>>> >>>>>> >>>>>>> >>>>>>> Thanks! >>>>>>> Roshan >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> > --bcaec550a988dc404f04cf81eb1b Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thanks Jim, do you mean the least significant bits of the timestamp?=A0

On Tue, Nov 27, 2= 012 at 4:45 PM, Jim Klucar <klucar@gmail.com> wrote:
Roshan,

Depending on what your cluste= r setup is and what the resolution of the time stamp is you could do someth= ing like this to spread the data around:

<timestamp-LSBs>-<string>-<reverse timestamp>

Using the LSBs of the timestamp as a uniform hash, then splitting on al= l possible hashes would spread things around a bit. If you do this, then al= l scans must check all hashes for data.




On Tue, Nov 27, 2012 at 1:25 PM, Keith T= urner <keith@deenlo.com> wrote:


On Tue, Nov 27, 2012 at 1:22 PM, Ro= shan Punnoose <roshanp@gmail.com> wrote:
Thanks!

The fact that you are using a binary tree behind= the scenes makes perfect sense. Btw, what do you use in the standalone (no= n native) implementation? Does it use a TreeMap?

When not using native code, ConcurrentSkipListMap is u= sed. =A0=A0
=A0


On Tue, Nov 27, 2012 at 12:57 PM, Keith = Turner <keith@deenlo.com> wrote:


On Tue, Nov 27, 2012 at 12:21 PM, R= oshan Punnoose <roshanp@gmail.com> wrote:
The <string> would most likely be a fixed set of strings that do not = change over time.

My question is if it is bad to use a r= everse index timestamp in the row id? Will it cause problems with the table= t splitting, compaction, and performance if the data is always being sent t= o the top of the tablet? If I define a split as everything prefixed with &l= t;string>, then the ingest will go to one tablet, but then I add a rever= se timestamp in the row, and that would mean I am always copying data to th= e top of the tablet. Will this cause performance issues? Or is it better to= append to a tablet?

I do not think it should matter. Ins= erts go into a C++ STL map on the tablet server if using the nativemap. =A0= I think the implementation of that is a balanced binary tree. =A0So I do n= ot think inserting at the beginning vs the end would make difference. =A0Th= at being said, I do not think I have tried this so I do not know if there w= ould be any suprises. =A0I would be interested in hearing about your experi= ences.
=A0


On Tue, Nov 2= 7, 2012 at 11:51 AM, Keith Turner <keith@deenlo.com> wrote:


Keith

On Tue= , Nov 27, 2012 at 10:41 AM, Roshan Punnoose <roshanp@gmail.com> wrote:
I want to have a table where the row will co= nsist of "<string>-<reverse index timestamp>". But th= is means that the data is always being prefixed to the beginning of the row= (or tablet if the row is large). Will this be a problem for compaction or = performance?=A0

Can you tell me more about what <string> is= ? =A0For example is it a hash or does it come from the set "foo1"= ,"foo2","foo3". =A0 How does it change over time? =A0I = think the answer to your question depends on what <string> is.
=A0

I don't know if I heard this correctly, but someone once= mentioned that making the row id the direct timestamp could cause performa= nce issues because data is always going to one tablet, but also because the= re is trouble splitting since it always appends to the tablet. Is this true= , is it similar to what could happen if I am always prefixing to a tablet?<= /div>

Yes using a timestamp for a row coul= d cause data from many clients to always go to the same tablet, which would= be bad for performance on a cluster.
=A0

Thanks!
Roshan<= /div>







--bcaec550a988dc404f04cf81eb1b--