hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Column qualifiers with hierarchy and filters
Date Thu, 07 Nov 2013 17:49:59 GMT
Please take a look
at src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java (0.94) :

  public static final String VALID_USER_TABLE_REGEX =
"(?:[a-zA-Z_0-9][a-zA-Z_0-9.-]*)";

Cheers


On Thu, Nov 7, 2013 at 9:47 AM, Nasron Cheong
<nasron.cheong@kontagent.com>wrote:

> Why is that? Afaik everything is just a byte sequence, what prevents
> non-printable chars from being used in CF/table names?
>
> - Nasron
>
>
> On Thu, Nov 7, 2013 at 8:39 AM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org
> > wrote:
>
> > This is fine for the key. Just so you are aware, you can not use this for
> > table name and CF name since they need to be printable characters only.
> >
> > JM
> >
> >
> > 2013/11/6 Nasron Cheong <nasron.cheong@kontagent.com>
> >
> > > Yes, after some digging around, the key is to store integers as byte
> > > representation, but more importantly to store them as big-endian so
> that
> > > the lexicographical sequence is maintained.
> > >
> > > Thanks!
> > >
> > > - Nasron
> > >
> > >
> > > On Tue, Nov 5, 2013 at 8:28 PM, Premal Shah <premal.j.shah@gmail.com>
> > > wrote:
> > >
> > > > you can store the byte representation of the integer (fixed length)
> > > instead
> > > > of the integer (which will be stored as strings of variable length)
> and
> > > > will also be sorted.
> > > >
> > > >
> > > > On Tue, Nov 5, 2013 at 1:58 PM, Nasron Cheong
> > > > <nasron.cheong@kontagent.com>wrote:
> > > >
> > > > > Yes, its limited in the sense that we have to precalculate the
> number
> > > of
> > > > > digits required so we don't run out, and if we overestimate, then
> our
> > > row
> > > > > keys end up taking up more space than we'd care to.
> > > > >
> > > > > We can probably live with this approach for now, but I wonder if
> > > there's
> > > > a
> > > > > better way.
> > > > >
> > > > > - Nasron
> > > > >
> > > > >
> > > > > On Tue, Nov 5, 2013 at 12:28 PM, Jean-Marc Spaggiari <
> > > > > jean-marc@spaggiari.org> wrote:
> > > > >
> > > > > > Hi Nasron,
> > > > > >
> > > > > > Why are you saying that it's a limited way? Does it achieve
your
> > > needs?
> > > > > >
> > > > > >
> > > > > > 2013/11/4 Nasron Cheong <nasron.cheong@kontagent.com>
> > > > > >
> > > > > > > An example query would be the following, say the column
> qualifier
> > > was
> > > > > of
> > > > > > > the form
> > > > > > >
> > > > > > > <bucket #>:<msg type>
> > > > > > >
> > > > > > > where <bucket #> should be an integer value, and
msg type is a
> > > > string.
> > > > > > E.g.
> > > > > > >
> > > > > > > 1:abc
> > > > > > > 1000:abc
> > > > > > > 2: abc
> > > > > > >
> > > > > > > would appear in the above sequence, which is out of order
when
> > > doing
> > > > > > prefix
> > > > > > > filtering. Zero padding could fix this:
> > > > > > >
> > > > > > > 0001:abc
> > > > > > > 0002:abc
> > > > > > > 1000: abc
> > > > > > >
> > > > > > > But is a limited way of ensuring the sequence of CQ (column
> > > > qualifiers)
> > > > > > is
> > > > > > > correct, in order for prefix filtering to work. Are there
other
> > > > > options?
> > > > > > >
> > > > > > > - Nasron
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Oct 31, 2013 at 9:19 PM, Nasron Cheong
> > > > > > > <nasron.cheong@kontagent.com>wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I'm trying to determine the best way to serialize
a sequence
> of
> > > > > > > > integers/strings that represent a hierarchy for a
column
> > > qualifier,
> > > > > > which
> > > > > > > > would be compatible with the ColumnPrefixFilters,
and
> > > > > > BinaryComparators.
> > > > > > > >
> > > > > > > > However, due to the lexicographical sorting, it's
awkward to
> > > > > serialize
> > > > > > > the
> > > > > > > > sequence of values needed to get it to work.
> > > > > > > >
> > > > > > > > What are the typical solutions to this? Do people
just zero
> pad
> > > > > > integers
> > > > > > > > to make sure they sort correctly? Or do I have to
implement
> my
> > > own
> > > > > > > > QualifierFilter - which seems expensive since I'd
be
> > > deserializing
> > > > > > every
> > > > > > > > byte array just to compare.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > - Nasron
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > > Premal Shah.
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message