accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jianshi Huang <jianshi.hu...@gmail.com>
Subject Re: Fetch rows in reversed order and how to model time range for quick fetching
Date Tue, 17 Jun 2014 09:30:58 GMT
Hi Josh,

Thank you for the reply, very educational. I now have the idea of how to
model my data :)

Probably I'll have both timestamp_receiverId_edgeId and
receiverId_timestamp_edgeId.

ReverseLexicoder is exactly what I want.

Jianshi



On Tue, Jun 17, 2014 at 1:15 PM, Josh Elser <josh.elser@gmail.com> wrote:

> The "acct:" in the row seems to be unnecessary. It seems like the ID
> should be enough. You'll want to consider the maximum of transactions that
> you want to support. You don't want a single row to grow indefinitely, but
> you're probably talking about GBs of data (compressed).
>
> The column family is usually best served as a filtering mechanism.
> Limiting it to "payment" alone is a good idea as you can then efficiently
> filter on that column family (or other relevant column families) by
> configuring a locality group.
>
> You could then make the column qualifier: timestamp_receiverId_edgeId.
>
> You might also be able to use the ReverseLexicoder[1] and the
> DateLexicoder[2] to encode the date so you can get the most recent
> transactions first.
>
> Lots of different ways to approach this, but it depends on what exactly
> you want to support.
>
> [1] http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/
> core/client/lexicoder/ReverseLexicoder.html
> [2] http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/
> core/client/lexicoder/DateLexicoder.html
>
>
> On 6/16/14, 10:02 PM, Jianshi Huang wrote:
>
>> Hi all,
>>
>> I'm thinking about storing payments in the following format:
>>
>> rowId: senderId (i.e. "acct:123")
>> CF: "payment@<timestamp>" (i.e. "payment@201406171224000")
>> CQ: receiverId_edgeId ("acct:456_payment:1001")
>> Value: properties
>>
>> Is this a good way to model payment events? The most frequent ops is to
>> get the last payment, so can I scan the table using a reversed range?
>>
>> Also I'd like to know if point-in-time status data can be modeled in a
>> similar fashion, or should I take advantage of the timestamp column.
>>
>>
>> Cheers,
>> --
>> Jianshi Huang
>>
>> LinkedIn: jianshi
>> Twitter: @jshuang
>> Github & Blog: http://huangjs.github.com/
>>
>


-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Mime
View raw message