hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kristoffer Sjögren <sto...@gmail.com>
Subject Re: Schema design for filters
Date Fri, 28 Jun 2013 09:24:59 GMT
Interesting. Im actually building something similar.

A fullblown SQL implementation is bit overkill for my particular usecase
and the query API is the final piece to the puzzle. But ill definitely have
a look for some inspiration.

Thanks!



On Fri, Jun 28, 2013 at 3:55 AM, James Taylor <jtaylor@salesforce.com>wrote:

> Hi Kristoffer,
> Have you had a look at Phoenix (https://github.com/forcedotcom/phoenix)?
> You could model your schema much like an O/R mapper and issue SQL queries
> through Phoenix for your filtering.
>
> James
> @JamesPlusPlus
> http://phoenix-hbase.blogspot.com
>
> On Jun 27, 2013, at 4:39 PM, "Kristoffer Sjögren" <stoffe@gmail.com>
> wrote:
>
> > Thanks for your help Mike. Much appreciated.
> >
> > I dont store rows/columns in JSON format. The schema is exactly that of a
> > specific java class, where the rowkey is a unique object identifier with
> > the class type encoded into it. Columns are the field names of the class
> > and the values are that of the object instance.
> >
> > Did think about coprocessors but the schema is discovered a runtime and I
> > cant hard code it.
> >
> > However, I still believe that filters might work. Had a look
> > at SingleColumnValueFilter and this filter is be able to target specific
> > column qualifiers with specific WritableByteArrayComparables.
> >
> > But list comparators are still missing... So I guess the only way is to
> > write these comparators?
> >
> > Do you follow my reasoning? Will it work?
> >
> >
> >
> >
> > On Fri, Jun 28, 2013 at 12:58 AM, Michael Segel
> > <michael_segel@hotmail.com>wrote:
> >
> >> Ok...
> >>
> >> If you want to do type checking and schema enforcement...
> >>
> >> You will need to do this as a coprocessor.
> >>
> >> The quick and dirty way... (Not recommended) would be to hard code the
> >> schema in to the co-processor code.)
> >>
> >> A better way... at start up, load up ZK to manage the set of known table
> >> schemas which would be a map of column qualifier to data type.
> >> (If JSON then you need to do a separate lookup to get the records
> schema)
> >>
> >> Then a single java class that does the look up and then handles the
> known
> >> data type comparators.
> >>
> >> Does this make sense?
> >> (Sorry, kinda was thinking this out as I typed the response. But it
> should
> >> work )
> >>
> >> At least it would be a design approach I would talk. YMMV
> >>
> >> Having said that, I expect someone to say its a bad idea and that they
> >> have a better solution.
> >>
> >> HTH
> >>
> >> -Mike
> >>
> >> On Jun 27, 2013, at 5:13 PM, Kristoffer Sjögren <stoffe@gmail.com>
> wrote:
> >>
> >>> I see your point. Everything is just bytes.
> >>>
> >>> However, the schema is known and every row is formatted according to
> this
> >>> schema, although some columns may not exist, that is, no value exist
> for
> >>> this property on this row.
> >>>
> >>> So if im able to apply these "typed comparators" to the right cell
> values
> >>> it may be possible? But I cant find a filter that target specific
> >> columns?
> >>>
> >>> Seems like all filters scan every column/qualifier and there is no way
> of
> >>> knowing what column is currently being evaluated?
> >>>
> >>>
> >>> On Thu, Jun 27, 2013 at 11:51 PM, Michael Segel
> >>> <michael_segel@hotmail.com>wrote:
> >>>
> >>>> You have to remember that HBase doesn't enforce any sort of typing.
> >>>> That's why this can be difficult.
> >>>>
> >>>> You'd have to write a coprocessor to enforce a schema on a table.
> >>>> Even then YMMV if you're writing JSON structures to a column because
> >> while
> >>>> the contents of the structures could be the same, the actual strings
> >> could
> >>>> differ.
> >>>>
> >>>> HTH
> >>>>
> >>>> -Mike
> >>>>
> >>>> On Jun 27, 2013, at 4:41 PM, Kristoffer Sjögren <stoffe@gmail.com>
> >> wrote:
> >>>>
> >>>>> I realize standard comparators cannot solve this.
> >>>>>
> >>>>> However I do know the type of each column so writing custom list
> >>>>> comparators for boolean, char, byte, short, int, long, float, double
> >>>> seems
> >>>>> quite straightforward.
> >>>>>
> >>>>> Long arrays, for example, are stored as a byte array with 8 bytes
per
> >>>> item
> >>>>> so a comparator might look like this.
> >>>>>
> >>>>> public class LongsComparator extends WritableByteArrayComparable
{
> >>>>>  public int compareTo(byte[] value, int offset, int length) {
> >>>>>      long[] values = BytesUtils.toLongs(value, offset, length);
> >>>>>      for (long longValue : values) {
> >>>>>          if (longValue == val) {
> >>>>>              return 0;
> >>>>>          }
> >>>>>      }
> >>>>>      return 1;
> >>>>>  }
> >>>>> }
> >>>>>
> >>>>> public static long[] toLongs(byte[] value, int offset, int length)
{
> >>>>>  int num = (length - offset) / 8;
> >>>>>  long[] values = new long[num];
> >>>>>  for (int i = offset; i < num; i++) {
> >>>>>      values[i] = getLong(value, i * 8);
> >>>>>  }
> >>>>>  return values;
> >>>>> }
> >>>>>
> >>>>>
> >>>>> Strings are similar but would require charset and length for each
> >> string.
> >>>>>
> >>>>> public class StringsComparator extends WritableByteArrayComparable
 {
> >>>>>  public int compareTo(byte[] value, int offset, int length) {
> >>>>>      String[] values = BytesUtils.toStrings(value, offset, length);
> >>>>>      for (String stringValue : values) {
> >>>>>          if (val.equals(stringValue)) {
> >>>>>              return 0;
> >>>>>          }
> >>>>>      }
> >>>>>      return 1;
> >>>>>  }
> >>>>> }
> >>>>>
> >>>>> public static String[] toStrings(byte[] value, int offset, int
> length)
> >> {
> >>>>>  ArrayList<String> values = new ArrayList<String>();
> >>>>>  int idx = 0;
> >>>>>  ByteBuffer buffer = ByteBuffer.wrap(value, offset, length);
> >>>>>  while (idx < length) {
> >>>>>      int size = buffer.getInt();
> >>>>>      byte[] bytes = new byte[size];
> >>>>>      buffer.get(bytes);
> >>>>>      values.add(new String(bytes));
> >>>>>      idx += 4 + size;
> >>>>>  }
> >>>>>  return values.toArray(new String[values.size()]);
> >>>>> }
> >>>>>
> >>>>>
> >>>>> Am I on the right track or maybe overlooking some implementation
> >> details?
> >>>>> Not really sure how to target each comparator to a specific column
> >> value?
> >>>>>
> >>>>>
> >>>>> On Thu, Jun 27, 2013 at 9:21 PM, Michael Segel <
> >>>> michael_segel@hotmail.com>wrote:
> >>>>>
> >>>>>> Not an easy task.
> >>>>>>
> >>>>>> You first need to determine how you want to store the data within
a
> >>>> column
> >>>>>> and/or apply a type constraint to a column.
> >>>>>>
> >>>>>> Even if you use JSON records to store your data within a column,
> does
> >> an
> >>>>>> equality comparator exist? If not, you would have to write one.
> >>>>>> (I kinda think that one may already exist...)
> >>>>>>
> >>>>>>
> >>>>>> On Jun 27, 2013, at 12:59 PM, Kristoffer Sjögren <stoffe@gmail.com>
> >>>> wrote:
> >>>>>>
> >>>>>>> Hi
> >>>>>>>
> >>>>>>> Working with the standard filtering mechanism to scan rows
that
> have
> >>>>>>> columns matching certain criterias.
> >>>>>>>
> >>>>>>> There are columns of numeric (integer and decimal) and string
> types.
> >>>>>> These
> >>>>>>> columns are single or multi-valued like "1", "2", "1,2,3",
"a", "b"
> >> or
> >>>>>>> "a,b,c" - not sure what the separator would be in the case
of list
> >>>> types.
> >>>>>>> Maybe none?
> >>>>>>>
> >>>>>>> I would like to compose the following queries to filter
out rows
> that
> >>>>>> does
> >>>>>>> not match.
> >>>>>>>
> >>>>>>> - contains(String column, String value)
> >>>>>>> Single valued column that String.contain() provided value.
> >>>>>>>
> >>>>>>> - equal(String column, Object value)
> >>>>>>> Single valued column that Object.equals() provided value.
> >>>>>>> Value is either string or numeric type.
> >>>>>>>
> >>>>>>> - greaterThan(String column, java.lang.Number value)
> >>>>>>> Single valued column that > provided numeric value.
> >>>>>>>
> >>>>>>> - in(String column, Object value...)
> >>>>>>> Multi-valued column have values that Object.equals() all
provided
> >>>>>> values.
> >>>>>>> Values are of string or numeric type.
> >>>>>>>
> >>>>>>> How would I design a schema that can take advantage of the
already
> >>>>>> existing
> >>>>>>> filters and comparators to accomplish this?
> >>>>>>>
> >>>>>>> Already looked at the string and binary comparators but
fail to see
> >> how
> >>>>>> to
> >>>>>>> solve this in a clean way for multi-valued column values.
> >>>>>>>
> >>>>>>> Im aware of custom filters but would like to avoid it if
possible.
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>> -Kristoffer
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message