hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nasron Cheong <nasron.che...@kontagent.com>
Subject Re: Column qualifiers with hierarchy and filters
Date Thu, 07 Nov 2013 17:47:03 GMT
Why is that? Afaik everything is just a byte sequence, what prevents
non-printable chars from being used in CF/table names?

- Nasron


On Thu, Nov 7, 2013 at 8:39 AM, Jean-Marc Spaggiari <jean-marc@spaggiari.org
> wrote:

> This is fine for the key. Just so you are aware, you can not use this for
> table name and CF name since they need to be printable characters only.
>
> JM
>
>
> 2013/11/6 Nasron Cheong <nasron.cheong@kontagent.com>
>
> > Yes, after some digging around, the key is to store integers as byte
> > representation, but more importantly to store them as big-endian so that
> > the lexicographical sequence is maintained.
> >
> > Thanks!
> >
> > - Nasron
> >
> >
> > On Tue, Nov 5, 2013 at 8:28 PM, Premal Shah <premal.j.shah@gmail.com>
> > wrote:
> >
> > > you can store the byte representation of the integer (fixed length)
> > instead
> > > of the integer (which will be stored as strings of variable length) and
> > > will also be sorted.
> > >
> > >
> > > On Tue, Nov 5, 2013 at 1:58 PM, Nasron Cheong
> > > <nasron.cheong@kontagent.com>wrote:
> > >
> > > > Yes, its limited in the sense that we have to precalculate the number
> > of
> > > > digits required so we don't run out, and if we overestimate, then our
> > row
> > > > keys end up taking up more space than we'd care to.
> > > >
> > > > We can probably live with this approach for now, but I wonder if
> > there's
> > > a
> > > > better way.
> > > >
> > > > - Nasron
> > > >
> > > >
> > > > On Tue, Nov 5, 2013 at 12:28 PM, Jean-Marc Spaggiari <
> > > > jean-marc@spaggiari.org> wrote:
> > > >
> > > > > Hi Nasron,
> > > > >
> > > > > Why are you saying that it's a limited way? Does it achieve your
> > needs?
> > > > >
> > > > >
> > > > > 2013/11/4 Nasron Cheong <nasron.cheong@kontagent.com>
> > > > >
> > > > > > An example query would be the following, say the column qualifier
> > was
> > > > of
> > > > > > the form
> > > > > >
> > > > > > <bucket #>:<msg type>
> > > > > >
> > > > > > where <bucket #> should be an integer value, and msg type
is a
> > > string.
> > > > > E.g.
> > > > > >
> > > > > > 1:abc
> > > > > > 1000:abc
> > > > > > 2: abc
> > > > > >
> > > > > > would appear in the above sequence, which is out of order when
> > doing
> > > > > prefix
> > > > > > filtering. Zero padding could fix this:
> > > > > >
> > > > > > 0001:abc
> > > > > > 0002:abc
> > > > > > 1000: abc
> > > > > >
> > > > > > But is a limited way of ensuring the sequence of CQ (column
> > > qualifiers)
> > > > > is
> > > > > > correct, in order for prefix filtering to work. Are there other
> > > > options?
> > > > > >
> > > > > > - Nasron
> > > > > >
> > > > > >
> > > > > > On Thu, Oct 31, 2013 at 9:19 PM, Nasron Cheong
> > > > > > <nasron.cheong@kontagent.com>wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I'm trying to determine the best way to serialize a sequence
of
> > > > > > > integers/strings that represent a hierarchy for a column
> > qualifier,
> > > > > which
> > > > > > > would be compatible with the ColumnPrefixFilters, and
> > > > > BinaryComparators.
> > > > > > >
> > > > > > > However, due to the lexicographical sorting, it's awkward
to
> > > > serialize
> > > > > > the
> > > > > > > sequence of values needed to get it to work.
> > > > > > >
> > > > > > > What are the typical solutions to this? Do people just
zero pad
> > > > > integers
> > > > > > > to make sure they sort correctly? Or do I have to implement
my
> > own
> > > > > > > QualifierFilter - which seems expensive since I'd be
> > deserializing
> > > > > every
> > > > > > > byte array just to compare.
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > - Nasron
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Premal Shah.
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message