hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Miguel Costa <miguel-co...@telecom.pt>
Subject RE: Use Timestamp
Date Tue, 05 Apr 2011 17:30:32 GMT
Yes I will put something in front of the date.


If the date comes in milliseconds in can be millions of  rows., even with a
combined key, but I will only need this data to  maybe hour map reduce jobs.


My focus here is if I gain anything put the timestamp in the columns instead
than the row , because I will have less rows bua a lot more columns with














From: Ted Dunning [mailto:tdunning@maprtech.com] 
Sent: terça-feira, 5 de Abril de 2011 17:02
To: user@hbase.apache.org
Cc: Miguel Costa
Subject: Re: Use Timestamp


Using timestamp as key will cause your scan to largely hit one region.  That
may not be so good.


If you add something in front of the date, you may be able to spread your
scan over several machines.


On the other hand, your aggregation might be very small.  In that case, the
convenience of a time key might be enough to sufficient to make you prefer
that implementation.


How much data are you talking about aggregating each time you aggregate?

On Tue, Apr 5, 2011 at 2:16 AM, Miguel Costa <miguel-costa@telecom.pt>

I want to have my data aggregated by day, so I would like to know wich is
the best option to query my data. To put The timestamp of the data on my
rowkey or to use timestamp of columns?


View raw message