Return-Path: X-Original-To: apmail-incubator-accumulo-user-archive@minotaur.apache.org Delivered-To: apmail-incubator-accumulo-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 188629C7A for ; Fri, 9 Mar 2012 16:55:36 +0000 (UTC) Received: (qmail 75850 invoked by uid 500); 9 Mar 2012 16:55:36 -0000 Delivered-To: apmail-incubator-accumulo-user-archive@incubator.apache.org Received: (qmail 75830 invoked by uid 500); 9 Mar 2012 16:55:36 -0000 Mailing-List: contact accumulo-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: accumulo-user@incubator.apache.org Delivered-To: mailing list accumulo-user@incubator.apache.org Received: (qmail 75822 invoked by uid 99); 9 Mar 2012 16:55:35 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Mar 2012 16:55:35 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [206.112.75.238] (HELO iron-d-outbound.osis.gov) (206.112.75.238) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Mar 2012 16:55:28 +0000 X-IronPort-AV: E=Sophos;i="4.73,559,1325480400"; d="scan'208";a="96360962" Received: from netmgmt.ext.intelink.gov (HELO ww4.ugov.gov) ([172.16.11.235]) by iron-d-outbound.osis.gov with ESMTP; 09 Mar 2012 11:53:39 -0500 Date: Fri, 9 Mar 2012 16:55:07 +0000 (GMT+00:00) From: Billie J Rinaldi To: accumulo-user@incubator.apache.org Message-ID: <1231959494.195971.1331312107112.JavaMail.root@linzimmb04o.imo.intelink.gov> In-Reply-To: Subject: Re: sorting in Accumulo MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [10.2.188.66] X-Mailer: Zimbra 6.0.7_GA_2476.RHEL4 (ZimbraWebClient - SAF3 (Mac)/6.0.7_GA_2473.RHEL5_64) X-Virus-Checked: Checked by ClamAV on apache.org On Friday, March 9, 2012 9:52:11 AM, "John R. Frank" wrote: > On Tue, Mar 6, 2012 at 1:06 PM, Jason Trost > wrote: > We do have records with same timestamp, so yes collisions occur at > that > level. > > We also have a "stream_id" field which is a unique ID constructed from > integer timestamp and md5 of the abs_url from which the content was > fetched -- for our corpus that is sufficiently unique that collisions > occur with essentially zero probability. > > > stream_id = 123456789-AAAABBBBCCCCDDDDEEEEFFFF0000 > ^^^^^^^^^ > timestamp > > I could convert the stream_id to be zero padded to the left to ensure > that > the integer is always fixed length. If we do that, do we need colqual? Yes, if the unique ID is in the row you could leave the column qualifier empty. Billie > Sounds like this schema be sufficient for sorting in temporal order > with > no meaningful order within a given second -- that would be fine for > our > purposes. > > > row: stream_id > colfam: "record" > value: JSON record > > > Thanks for all the responses! > > jrf