accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Billie J Rinaldi <billie.j.rina...@ugov.gov>
Subject Re: sorting in Accumulo
Date Fri, 09 Mar 2012 16:55:07 GMT
On Friday, March 9, 2012 9:52:11 AM, "John R. Frank" <jrf@mit.edu> wrote:
> On Tue, Mar 6, 2012 at 1:06 PM, Jason Trost <jason.trost@gmail.com>
> wrote:
> We do have records with same timestamp, so yes collisions occur at
> that
> level.
> 
> We also have a "stream_id" field which is a unique ID constructed from
> integer timestamp and md5 of the abs_url from which the content was
> fetched -- for our corpus that is sufficiently unique that collisions
> occur with essentially zero probability.
> 
> 
> stream_id = 123456789-AAAABBBBCCCCDDDDEEEEFFFF0000
> ^^^^^^^^^
> timestamp
> 
> I could convert the stream_id to be zero padded to the left to ensure
> that
> the integer is always fixed length. If we do that, do we need colqual?

Yes, if the unique ID is in the row you could leave the column qualifier empty.

Billie


> Sounds like this schema be sufficient for sorting in temporal order
> with
> no meaningful order within a given second -- that would be fine for
> our
> purposes.
> 
> 
> row: stream_id
> colfam: "record"
> value: JSON record
> 
> 
> Thanks for all the responses!
> 
> jrf

Mime
View raw message