hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Baranau <alex.barano...@gmail.com>
Subject Re: Best Practices Adding Rows
Date Tue, 07 Dec 2010 07:21:28 GMT
I think I've faced by the key format, smth like "<date><hour><smth>" several
times in the list recently. Which I assume is a "String-format".

Please, correct me if I'm wrong, but it makes more sense to me to use (with
preserving all needed reading possibilities: by date, by hour, etc.) smth
like Bytes.add(<time>, <smth>) as a key instead. Where <time> is byte[]
representation of time (long). Advantages would be smaller key size (and
since key is stored for each cell in HBase this means data amount
reduction). Also I'd imagine that it could leave off conversion between
sting/date/etc. representations.

Am I missing something?

Alex Baranau
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase

On Mon, Dec 6, 2010 at 7:27 PM, Todd Lipcon <todd@cloudera.com> wrote:

> Hi Peter,
>
> You can set the start row to '20101201|14' and the end row to '20101201|15'
> using the scanner API:
>
> http://archive.cloudera.com/cdh/3/hbase/apidocs/org/apache/hadoop/hbase/client/Scan.html#setStartRow(byte[])<http://archive.cloudera.com/cdh/3/hbase/apidocs/org/apache/hadoop/hbase/client/Scan.html#setStartRow%28byte[]%29>
>
> <
> http://archive.cloudera.com/cdh/3/hbase/apidocs/org/apache/hadoop/hbase/client/Scan.html#setStartRow(byte[])<http://archive.cloudera.com/cdh/3/hbase/apidocs/org/apache/hadoop/hbase/client/Scan.html#setStartRow%28byte[]%29>
> >
> Thanks
> -Todd
>
> On Mon, Dec 6, 2010 at 9:21 AM, Peter Haidinyak <phaidinyak@local.com
> >wrote:
>
> > Hi,
> >  I have to enter log data into HBase. We will need to query the data by
> > Date:Hour
> > I am using the 'Date|Hour|Incrementing Counter' as the Row Id. Is there
> an
> > easy was to request the starting and stopping rows in a scan using some
> > similar to 'like'?
> >
> > Scan 'T1', {STARTROW=>'like 20101201|14'}
> >
> > If not, what would be the best way to retrieve only one hour's worth of
> > data? I am thinking of using another table to hold the incrementing count
> > information for a Date|Hour and use that for Start/Stop.
> >
> > Thanks
> >
> > -Pete
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message