accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John R. Frank" <>
Subject Re: sorting in Accumulo
Date Fri, 09 Mar 2012 14:52:11 GMT
On Tue, Mar 6, 2012 at 1:06 PM, Jason Trost <> wrote:
> You could ingest this data into accumulo using the following "schema"
> row:     timestamp
> colfam:  "record"
> colqual: md5(JSON)
> value:   JSON record

We do have records with same timestamp, so yes collisions occur at that 

We also have a "stream_id" field which is a unique ID constructed from 
integer timestamp and md5 of the abs_url from which the content was 
fetched -- for our corpus that is sufficiently unique that collisions 
occur with essentially zero probability.

stream_id = 123456789-AAAABBBBCCCCDDDDEEEEFFFF0000

I could convert the stream_id to be zero padded to the left to ensure that 
the integer is always fixed length.  If we do that, do we need colqual?

Sounds like this schema be sufficient for sorting in temporal order with 
no meaningful order within a given second -- that would be fine for our 

row:     stream_id
colfam:  "record"
value:   JSON record

Thanks for all the responses!


View raw message