hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Asaf Mesika <asaf.mes...@gmail.com>
Subject Re: Help in designing row key
Date Wed, 03 Jul 2013 21:23:10 GMT
Seems right. You can make it more efficient by creating your result array
in advance and then fill it.
Regarding time filtering. Have you see that in Scan you can set start time
and end time?

On Wednesday, July 3, 2013, Flavio Pompermaier wrote:

> All my enums produce positive integers so I don't have +/-ve Integer
> problems.
> Obviously If I use fixed-length rowKeys I could take away the separator..
>
> Sorry but I'm very a newbie in this field..I'm trying to understand how to
> compose my key with the bytes..
> Is it correct the following?
>
> final byte[] firstToken = Bytes.toBytes(source);
> final byte[] secondToken = Bytes.toBytes(type);
> final byte[] thirdToken = Bytes.toBytes(qualifier);
> final byte[] fourthToken = Bytes.toBytes(md5ofSomeString);
> byte[] rowKey = Bytes.add(firstToken,secondToken,thirdToken);
> rowKey =  Bytes.add(rowKey,fourthToken);
>
> Best,
> Flavio
>
>
> On Wed, Jul 3, 2013 at 11:58 AM, Anoop John <anoop.hbase@gmail.com> wrote:
>
> > When you make the RK and convert the int parts into byte[] ( Use
> > org.apache.hadoop.hbase.util.Bytes#toBytes(*int) *)  it will give 4 bytes
> > for every byte..  Be careful about the ordering...   When u convert a +ve
> > and -ve integer into byte[] and u do Lexiographical compare (as done in
> > HBase) u will see -ve number being greater than +ve..  If you dont have
> to
> > do deal with -ve numbers no issues  :)
> >
> > Well when all the parts of the RK is of fixed width u will need any
> > seperator??
> >
> > -Anoop-
> >
> > On Wed, Jul 3, 2013 at 2:44 PM, Flavio Pompermaier <pompermaier@okkam.it
> > >wrote:
> >
> > > Yeah, I was thinking to use a normalization step in order to allow the
> > use
> > > of FuzzyRowFilter but what is not clear to me is if integers must also
> be
> > > normalized or not.
> > > I will explain myself better. Suppose that i follow your advice and I
> > > produce keys like:
> > >  - 1|1|somehash|sometimestamp
> > >  - 55|555|somehash|sometimestamp
> > >
> > > Whould they match the same pattern or do I have to normalize them to
> the
> > > following?
> > >  - 001|001|somehash|sometimestamp
> > >  - 055|555|somehash|sometimestamp
> > >
> > > Moreover, I noticed that you used dots ('.') to separate things instead
> > of
> > > pipe ('|')..is there a reason for that (maybe performance or whatever)
> or
> > > is just your favourite separator?
> > >
> > > Best,
> > > Flavio
> > >
> > >
> > > On Wed, Jul 3, 2013 at 10:12 AM, Mike Axiak <mike@axiak.net> wrote:
> > >
> > > > I'm not sure if you're eliding this fact or not, but you'd be much
> > > > better off if you used a fixed-width format for your keys. So in your
> > > > example, you'd have:
> > > >
> > > > PATTERN: source(4-byte-int).type(4-byte-int or smaller).fixed 128-bit
> > > > hash.8-byte timestamp
> > > >
> > > > Example: \x00\x00\x00\x01\x00\x00\x02\x03....
> > > >
> > > > The advantage of this is not only that it's significantly less data
> > > > (remember your key is stored on each KeyValue), but also you can now
> > > > use FuzzyRowFilter and other techniques to quickly perform scans. The
> > > > disadvantage is that you have to normalize the source-> integer but
I
> > > > find I can either store that in an enum or cache it for a long time
> so
> > > > it's not a big issue.
> > > >
> > > > -Mike
> > > >
> > > > On Wed, Jul 3, 2013 at 4:05 AM, Flavio Pompermaier <
> > pompermaier@okkam.it
> > > >
> > > > wrote:
> > > > > Thank you very much for the great support!
> > > > > This is how I thought to design my key:
> > > > >
> > > > > PATTERN: source|type|qualifier|hash(name)|timestamp
> > > > > EXAMPLE:
> > > > >
> google|appliance|oven|be9173589a7471a7179e928adc1a86f7|1372837702753
> > > > >
> > > > > Do you think my key could be good for my scope (my search will be
> > > > > essentially by source or source|type)?
> > > > > Another point is that initially I will not have so many sources,
> so I
> > > > will
> > > > > probably have only google|* but in the next phases there could be
> > more
> > > > > sources..
> > > > >
> > > > > Best,
> > > > > Flavio
> > > > >
> > > > > On Tue, Jul 2, 2013 at 7:53 PM, Ted Yu <yuzhihong@gmail.com>
> wrote:
> > > > >
> > > > >> For #1, yes - the client receives less data after filtering.
> > > > >>
> > > > >> For #2, please take a look at TestMultiVersions
> > > > >> (./src/test/java/org/apache/hadoop/hbase/TestMultiVersions.java
in
> > > 0.94)
> > > > >> for time range:
> > > >

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message