Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7F99510ED4 for ; Thu, 7 Nov 2013 17:50:35 +0000 (UTC) Received: (qmail 29545 invoked by uid 500); 7 Nov 2013 17:50:27 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 29458 invoked by uid 500); 7 Nov 2013 17:50:26 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 29450 invoked by uid 99); 7 Nov 2013 17:50:25 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Nov 2013 17:50:25 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of yuzhihong@gmail.com designates 209.85.215.41 as permitted sender) Received: from [209.85.215.41] (HELO mail-la0-f41.google.com) (209.85.215.41) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Nov 2013 17:50:21 +0000 Received: by mail-la0-f41.google.com with SMTP id ea20so733759lab.14 for ; Thu, 07 Nov 2013 09:49:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=l+3SyY0SaVJQyIMJnPSOrZbvFVsgsjuDFVWBFi/s1R4=; b=hIPWSIZgWHx+2cpVolrANOoUU2YTMOSMbSlJYYmSCDJmayxWIPlG3XIMGA4nC6h52D 47LznRiTGVDzde9zkjRgw8Zlgr1UkgGfM6jihYVLnTPTTg2KGShQc8gXHAMnxp6ygaq1 FMu15UsxqkY4TnTco9QzF0Nysg1mNYZVnCrp72zn1evzisOrJzYtK3ttEZ/08lsoekkq lJay79mqb/kI/8f0Slq8ie8jfVXT6ZbB5cgO3dPtHp+0YP/BOSWlO4WtyrLxnN6b5jra 8Ocm/EjY/CjMUHykIplDu79x9QxjEGR9NRpDuZpdcn8Iqh7vpFs/iztSvc0gRPc0UNbT OPXg== MIME-Version: 1.0 X-Received: by 10.112.210.197 with SMTP id mw5mr1946008lbc.42.1383846599780; Thu, 07 Nov 2013 09:49:59 -0800 (PST) Received: by 10.112.129.40 with HTTP; Thu, 7 Nov 2013 09:49:59 -0800 (PST) In-Reply-To: References: Date: Thu, 7 Nov 2013 09:49:59 -0800 Message-ID: Subject: Re: Column qualifiers with hierarchy and filters From: Ted Yu To: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=001a11c3c7dc00efa204ea99e520 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c3c7dc00efa204ea99e520 Content-Type: text/plain; charset=ISO-8859-1 Please take a look at src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java (0.94) : public static final String VALID_USER_TABLE_REGEX = "(?:[a-zA-Z_0-9][a-zA-Z_0-9.-]*)"; Cheers On Thu, Nov 7, 2013 at 9:47 AM, Nasron Cheong wrote: > Why is that? Afaik everything is just a byte sequence, what prevents > non-printable chars from being used in CF/table names? > > - Nasron > > > On Thu, Nov 7, 2013 at 8:39 AM, Jean-Marc Spaggiari < > jean-marc@spaggiari.org > > wrote: > > > This is fine for the key. Just so you are aware, you can not use this for > > table name and CF name since they need to be printable characters only. > > > > JM > > > > > > 2013/11/6 Nasron Cheong > > > > > Yes, after some digging around, the key is to store integers as byte > > > representation, but more importantly to store them as big-endian so > that > > > the lexicographical sequence is maintained. > > > > > > Thanks! > > > > > > - Nasron > > > > > > > > > On Tue, Nov 5, 2013 at 8:28 PM, Premal Shah > > > wrote: > > > > > > > you can store the byte representation of the integer (fixed length) > > > instead > > > > of the integer (which will be stored as strings of variable length) > and > > > > will also be sorted. > > > > > > > > > > > > On Tue, Nov 5, 2013 at 1:58 PM, Nasron Cheong > > > > wrote: > > > > > > > > > Yes, its limited in the sense that we have to precalculate the > number > > > of > > > > > digits required so we don't run out, and if we overestimate, then > our > > > row > > > > > keys end up taking up more space than we'd care to. > > > > > > > > > > We can probably live with this approach for now, but I wonder if > > > there's > > > > a > > > > > better way. > > > > > > > > > > - Nasron > > > > > > > > > > > > > > > On Tue, Nov 5, 2013 at 12:28 PM, Jean-Marc Spaggiari < > > > > > jean-marc@spaggiari.org> wrote: > > > > > > > > > > > Hi Nasron, > > > > > > > > > > > > Why are you saying that it's a limited way? Does it achieve your > > > needs? > > > > > > > > > > > > > > > > > > 2013/11/4 Nasron Cheong > > > > > > > > > > > > > An example query would be the following, say the column > qualifier > > > was > > > > > of > > > > > > > the form > > > > > > > > > > > > > > : > > > > > > > > > > > > > > where should be an integer value, and msg type is a > > > > string. > > > > > > E.g. > > > > > > > > > > > > > > 1:abc > > > > > > > 1000:abc > > > > > > > 2: abc > > > > > > > > > > > > > > would appear in the above sequence, which is out of order when > > > doing > > > > > > prefix > > > > > > > filtering. Zero padding could fix this: > > > > > > > > > > > > > > 0001:abc > > > > > > > 0002:abc > > > > > > > 1000: abc > > > > > > > > > > > > > > But is a limited way of ensuring the sequence of CQ (column > > > > qualifiers) > > > > > > is > > > > > > > correct, in order for prefix filtering to work. Are there other > > > > > options? > > > > > > > > > > > > > > - Nasron > > > > > > > > > > > > > > > > > > > > > On Thu, Oct 31, 2013 at 9:19 PM, Nasron Cheong > > > > > > > wrote: > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > I'm trying to determine the best way to serialize a sequence > of > > > > > > > > integers/strings that represent a hierarchy for a column > > > qualifier, > > > > > > which > > > > > > > > would be compatible with the ColumnPrefixFilters, and > > > > > > BinaryComparators. > > > > > > > > > > > > > > > > However, due to the lexicographical sorting, it's awkward to > > > > > serialize > > > > > > > the > > > > > > > > sequence of values needed to get it to work. > > > > > > > > > > > > > > > > What are the typical solutions to this? Do people just zero > pad > > > > > > integers > > > > > > > > to make sure they sort correctly? Or do I have to implement > my > > > own > > > > > > > > QualifierFilter - which seems expensive since I'd be > > > deserializing > > > > > > every > > > > > > > > byte array just to compare. > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > - Nasron > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Regards, > > > > Premal Shah. > > > > > > > > > > --001a11c3c7dc00efa204ea99e520--