accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <>
Subject Re: sorting in Accumulo
Date Tue, 06 Mar 2012 18:26:25 GMT
If you want to sort in descending order, you can make the row
(Long.MAX_VALUE - timestamp).  Stil make this fixed width.

On Tue, Mar 6, 2012 at 1:06 PM, Jason Trost <> wrote:
> You could ingest this data into accumulo using the following "schema"
> row:       timestamp
> colfam:  "record"
> colqual: md5(JSON)
> value:   JSON record
> Accumulo would sort this for you in lexicographical order by timestamp
> (stored as a string). Depending on the range your data comes from, if
> all the epoch timestamps are the same length, then lexigraphical
> should equal numeric sorting.  If this is not the case for you, then
> you could convert your timestamps to a string using the following
> template (with each field zero padded to its max length):
> ${year}${month}{$day}${hour}${minute}${second}
> The md5(JSON) is there b/c I assume some of your events could have the
> same timestamp.  If you could have events that are exactly the same
> (and you need to track this) you may want to append a one-up counter
> to the md5 just to gurantee that you won't overwritten duplicates.
> Without the md5 (or another simialr mechanism), Accumulo would
> overwrite any previously stored values with the exact same [row,
> colfam, colqual, colvis].
> Iterating in temporal order would just be a simple full table scan.
> I hope this helps.
> --Jason
> On Tue, Mar 6, 2012 at 12:15 PM, John R. Frank <> wrote:
>> Accumulo Experts,
>> Is there an example of working with a time-ordered stream in Accumulo?
>> Given:
>>        ~500M JSON records each about 30kb
>>        each record hasa timestamp field (seconds since the epoch)
>> Goal:
>>        iterate over all records in temporal order
>>        run some function on this simulated stream
>> Thanks for any pointers or advice!
>> John

View raw message