Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 69A96C797 for ; Fri, 28 Jun 2013 09:25:30 +0000 (UTC) Received: (qmail 57332 invoked by uid 500); 28 Jun 2013 09:25:28 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 57089 invoked by uid 500); 28 Jun 2013 09:25:27 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 57081 invoked by uid 99); 28 Jun 2013 09:25:27 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Jun 2013 09:25:27 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of stoffe@gmail.com designates 209.85.220.41 as permitted sender) Received: from [209.85.220.41] (HELO mail-pa0-f41.google.com) (209.85.220.41) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Jun 2013 09:25:20 +0000 Received: by mail-pa0-f41.google.com with SMTP id bj3so2227154pad.14 for ; Fri, 28 Jun 2013 02:24:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=V5dlUStbHAvzpcmAKLcomqDU1g7RqoScHmr6q/5m9/A=; b=xZFrHlr5DsbhJ+spV6WPL0a4cJqzStoNfz2bU0kqZl02XmXzIQA0u5rCmrCj6nQ7DV 3Skz2yMd6qeUIvRSkx6N/qTKBHper4tvJzXl6W4Jy6cmjg+PW8+HE1SJR/bzJPsvO6ev 09Itx8i+AeqKHj5vZK2YZc6513xg7eRRTic48Nh05ZM1wqk8CHVheS13daL3Nh8YzOuq utui0/BQ2ShEXig1bf5z4EZewPzYc05GBwvF3KjG7OhnMcnweUl6Gg8m/m/p0A5H3y99 l4XUGNp/zySP5v06EviKJNT5kGABWhq40+1OkWzRvhHutWtnRVAPXqumfYBFHGUp2cT1 a11Q== MIME-Version: 1.0 X-Received: by 10.68.202.41 with SMTP id kf9mr10555575pbc.80.1372411499635; Fri, 28 Jun 2013 02:24:59 -0700 (PDT) Received: by 10.66.220.4 with HTTP; Fri, 28 Jun 2013 02:24:59 -0700 (PDT) In-Reply-To: <873D5826-8065-4E6B-AF83-D272BD6986C3@salesforce.com> References: <873D5826-8065-4E6B-AF83-D272BD6986C3@salesforce.com> Date: Fri, 28 Jun 2013 11:24:59 +0200 Message-ID: Subject: Re: Schema design for filters From: =?ISO-8859-1?Q?Kristoffer_Sj=F6gren?= To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=047d7b15afefebeaec04e0337363 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b15afefebeaec04e0337363 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Interesting. Im actually building something similar. A fullblown SQL implementation is bit overkill for my particular usecase and the query API is the final piece to the puzzle. But ill definitely have a look for some inspiration. Thanks! On Fri, Jun 28, 2013 at 3:55 AM, James Taylor wrote= : > Hi Kristoffer, > Have you had a look at Phoenix (https://github.com/forcedotcom/phoenix)? > You could model your schema much like an O/R mapper and issue SQL queries > through Phoenix for your filtering. > > James > @JamesPlusPlus > http://phoenix-hbase.blogspot.com > > On Jun 27, 2013, at 4:39 PM, "Kristoffer Sj=F6gren" > wrote: > > > Thanks for your help Mike. Much appreciated. > > > > I dont store rows/columns in JSON format. The schema is exactly that of= a > > specific java class, where the rowkey is a unique object identifier wit= h > > the class type encoded into it. Columns are the field names of the clas= s > > and the values are that of the object instance. > > > > Did think about coprocessors but the schema is discovered a runtime and= I > > cant hard code it. > > > > However, I still believe that filters might work. Had a look > > at SingleColumnValueFilter and this filter is be able to target specifi= c > > column qualifiers with specific WritableByteArrayComparables. > > > > But list comparators are still missing... So I guess the only way is to > > write these comparators? > > > > Do you follow my reasoning? Will it work? > > > > > > > > > > On Fri, Jun 28, 2013 at 12:58 AM, Michael Segel > > wrote: > > > >> Ok... > >> > >> If you want to do type checking and schema enforcement... > >> > >> You will need to do this as a coprocessor. > >> > >> The quick and dirty way... (Not recommended) would be to hard code the > >> schema in to the co-processor code.) > >> > >> A better way... at start up, load up ZK to manage the set of known tab= le > >> schemas which would be a map of column qualifier to data type. > >> (If JSON then you need to do a separate lookup to get the records > schema) > >> > >> Then a single java class that does the look up and then handles the > known > >> data type comparators. > >> > >> Does this make sense? > >> (Sorry, kinda was thinking this out as I typed the response. But it > should > >> work ) > >> > >> At least it would be a design approach I would talk. YMMV > >> > >> Having said that, I expect someone to say its a bad idea and that they > >> have a better solution. > >> > >> HTH > >> > >> -Mike > >> > >> On Jun 27, 2013, at 5:13 PM, Kristoffer Sj=F6gren > wrote: > >> > >>> I see your point. Everything is just bytes. > >>> > >>> However, the schema is known and every row is formatted according to > this > >>> schema, although some columns may not exist, that is, no value exist > for > >>> this property on this row. > >>> > >>> So if im able to apply these "typed comparators" to the right cell > values > >>> it may be possible? But I cant find a filter that target specific > >> columns? > >>> > >>> Seems like all filters scan every column/qualifier and there is no wa= y > of > >>> knowing what column is currently being evaluated? > >>> > >>> > >>> On Thu, Jun 27, 2013 at 11:51 PM, Michael Segel > >>> wrote: > >>> > >>>> You have to remember that HBase doesn't enforce any sort of typing. > >>>> That's why this can be difficult. > >>>> > >>>> You'd have to write a coprocessor to enforce a schema on a table. > >>>> Even then YMMV if you're writing JSON structures to a column because > >> while > >>>> the contents of the structures could be the same, the actual strings > >> could > >>>> differ. > >>>> > >>>> HTH > >>>> > >>>> -Mike > >>>> > >>>> On Jun 27, 2013, at 4:41 PM, Kristoffer Sj=F6gren > >> wrote: > >>>> > >>>>> I realize standard comparators cannot solve this. > >>>>> > >>>>> However I do know the type of each column so writing custom list > >>>>> comparators for boolean, char, byte, short, int, long, float, doubl= e > >>>> seems > >>>>> quite straightforward. > >>>>> > >>>>> Long arrays, for example, are stored as a byte array with 8 bytes p= er > >>>> item > >>>>> so a comparator might look like this. > >>>>> > >>>>> public class LongsComparator extends WritableByteArrayComparable { > >>>>> public int compareTo(byte[] value, int offset, int length) { > >>>>> long[] values =3D BytesUtils.toLongs(value, offset, length); > >>>>> for (long longValue : values) { > >>>>> if (longValue =3D=3D val) { > >>>>> return 0; > >>>>> } > >>>>> } > >>>>> return 1; > >>>>> } > >>>>> } > >>>>> > >>>>> public static long[] toLongs(byte[] value, int offset, int length) = { > >>>>> int num =3D (length - offset) / 8; > >>>>> long[] values =3D new long[num]; > >>>>> for (int i =3D offset; i < num; i++) { > >>>>> values[i] =3D getLong(value, i * 8); > >>>>> } > >>>>> return values; > >>>>> } > >>>>> > >>>>> > >>>>> Strings are similar but would require charset and length for each > >> string. > >>>>> > >>>>> public class StringsComparator extends WritableByteArrayComparable = { > >>>>> public int compareTo(byte[] value, int offset, int length) { > >>>>> String[] values =3D BytesUtils.toStrings(value, offset, length= ); > >>>>> for (String stringValue : values) { > >>>>> if (val.equals(stringValue)) { > >>>>> return 0; > >>>>> } > >>>>> } > >>>>> return 1; > >>>>> } > >>>>> } > >>>>> > >>>>> public static String[] toStrings(byte[] value, int offset, int > length) > >> { > >>>>> ArrayList values =3D new ArrayList(); > >>>>> int idx =3D 0; > >>>>> ByteBuffer buffer =3D ByteBuffer.wrap(value, offset, length); > >>>>> while (idx < length) { > >>>>> int size =3D buffer.getInt(); > >>>>> byte[] bytes =3D new byte[size]; > >>>>> buffer.get(bytes); > >>>>> values.add(new String(bytes)); > >>>>> idx +=3D 4 + size; > >>>>> } > >>>>> return values.toArray(new String[values.size()]); > >>>>> } > >>>>> > >>>>> > >>>>> Am I on the right track or maybe overlooking some implementation > >> details? > >>>>> Not really sure how to target each comparator to a specific column > >> value? > >>>>> > >>>>> > >>>>> On Thu, Jun 27, 2013 at 9:21 PM, Michael Segel < > >>>> michael_segel@hotmail.com>wrote: > >>>>> > >>>>>> Not an easy task. > >>>>>> > >>>>>> You first need to determine how you want to store the data within = a > >>>> column > >>>>>> and/or apply a type constraint to a column. > >>>>>> > >>>>>> Even if you use JSON records to store your data within a column, > does > >> an > >>>>>> equality comparator exist? If not, you would have to write one. > >>>>>> (I kinda think that one may already exist...) > >>>>>> > >>>>>> > >>>>>> On Jun 27, 2013, at 12:59 PM, Kristoffer Sj=F6gren > >>>> wrote: > >>>>>> > >>>>>>> Hi > >>>>>>> > >>>>>>> Working with the standard filtering mechanism to scan rows that > have > >>>>>>> columns matching certain criterias. > >>>>>>> > >>>>>>> There are columns of numeric (integer and decimal) and string > types. > >>>>>> These > >>>>>>> columns are single or multi-valued like "1", "2", "1,2,3", "a", "= b" > >> or > >>>>>>> "a,b,c" - not sure what the separator would be in the case of lis= t > >>>> types. > >>>>>>> Maybe none? > >>>>>>> > >>>>>>> I would like to compose the following queries to filter out rows > that > >>>>>> does > >>>>>>> not match. > >>>>>>> > >>>>>>> - contains(String column, String value) > >>>>>>> Single valued column that String.contain() provided value. > >>>>>>> > >>>>>>> - equal(String column, Object value) > >>>>>>> Single valued column that Object.equals() provided value. > >>>>>>> Value is either string or numeric type. > >>>>>>> > >>>>>>> - greaterThan(String column, java.lang.Number value) > >>>>>>> Single valued column that > provided numeric value. > >>>>>>> > >>>>>>> - in(String column, Object value...) > >>>>>>> Multi-valued column have values that Object.equals() all provided > >>>>>> values. > >>>>>>> Values are of string or numeric type. > >>>>>>> > >>>>>>> How would I design a schema that can take advantage of the alread= y > >>>>>> existing > >>>>>>> filters and comparators to accomplish this? > >>>>>>> > >>>>>>> Already looked at the string and binary comparators but fail to s= ee > >> how > >>>>>> to > >>>>>>> solve this in a clean way for multi-valued column values. > >>>>>>> > >>>>>>> Im aware of custom filters but would like to avoid it if possible= . > >>>>>>> > >>>>>>> Cheers, > >>>>>>> -Kristoffer > >>>>>> > >>>>>> > >>>> > >>>> > >> > >> > --047d7b15afefebeaec04e0337363--