hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis.gospodne...@gmail.com>
Subject Re: Schema design for filters
Date Fri, 28 Jun 2013 18:34:14 GMT
Kristoffer,

You could also consider using something other than HBase, something
that supports "secondary indices", like anything that is Lucene based
- Solr and ElasticSearch for example.  We recently compared how we
aggregate data in HBase (see my signature) and how we would do it if
we were to use Solr (or ElasticSearch), and so far things look better
in Solr for our use case.  And our use case involves a lot of
filtering, slicing and dicing..... something to consider...

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Fri, Jun 28, 2013 at 5:24 AM, Kristoffer Sjögren <stoffe@gmail.com> wrote:
> Interesting. Im actually building something similar.
>
> A fullblown SQL implementation is bit overkill for my particular usecase
> and the query API is the final piece to the puzzle. But ill definitely have
> a look for some inspiration.
>
> Thanks!
>
>
>
> On Fri, Jun 28, 2013 at 3:55 AM, James Taylor <jtaylor@salesforce.com>wrote:
>
>> Hi Kristoffer,
>> Have you had a look at Phoenix (https://github.com/forcedotcom/phoenix)?
>> You could model your schema much like an O/R mapper and issue SQL queries
>> through Phoenix for your filtering.
>>
>> James
>> @JamesPlusPlus
>> http://phoenix-hbase.blogspot.com
>>
>> On Jun 27, 2013, at 4:39 PM, "Kristoffer Sjögren" <stoffe@gmail.com>
>> wrote:
>>
>> > Thanks for your help Mike. Much appreciated.
>> >
>> > I dont store rows/columns in JSON format. The schema is exactly that of a
>> > specific java class, where the rowkey is a unique object identifier with
>> > the class type encoded into it. Columns are the field names of the class
>> > and the values are that of the object instance.
>> >
>> > Did think about coprocessors but the schema is discovered a runtime and I
>> > cant hard code it.
>> >
>> > However, I still believe that filters might work. Had a look
>> > at SingleColumnValueFilter and this filter is be able to target specific
>> > column qualifiers with specific WritableByteArrayComparables.
>> >
>> > But list comparators are still missing... So I guess the only way is to
>> > write these comparators?
>> >
>> > Do you follow my reasoning? Will it work?
>> >
>> >
>> >
>> >
>> > On Fri, Jun 28, 2013 at 12:58 AM, Michael Segel
>> > <michael_segel@hotmail.com>wrote:
>> >
>> >> Ok...
>> >>
>> >> If you want to do type checking and schema enforcement...
>> >>
>> >> You will need to do this as a coprocessor.
>> >>
>> >> The quick and dirty way... (Not recommended) would be to hard code the
>> >> schema in to the co-processor code.)
>> >>
>> >> A better way... at start up, load up ZK to manage the set of known table
>> >> schemas which would be a map of column qualifier to data type.
>> >> (If JSON then you need to do a separate lookup to get the records
>> schema)
>> >>
>> >> Then a single java class that does the look up and then handles the
>> known
>> >> data type comparators.
>> >>
>> >> Does this make sense?
>> >> (Sorry, kinda was thinking this out as I typed the response. But it
>> should
>> >> work )
>> >>
>> >> At least it would be a design approach I would talk. YMMV
>> >>
>> >> Having said that, I expect someone to say its a bad idea and that they
>> >> have a better solution.
>> >>
>> >> HTH
>> >>
>> >> -Mike
>> >>
>> >> On Jun 27, 2013, at 5:13 PM, Kristoffer Sjögren <stoffe@gmail.com>
>> wrote:
>> >>
>> >>> I see your point. Everything is just bytes.
>> >>>
>> >>> However, the schema is known and every row is formatted according to
>> this
>> >>> schema, although some columns may not exist, that is, no value exist
>> for
>> >>> this property on this row.
>> >>>
>> >>> So if im able to apply these "typed comparators" to the right cell
>> values
>> >>> it may be possible? But I cant find a filter that target specific
>> >> columns?
>> >>>
>> >>> Seems like all filters scan every column/qualifier and there is no way
>> of
>> >>> knowing what column is currently being evaluated?
>> >>>
>> >>>
>> >>> On Thu, Jun 27, 2013 at 11:51 PM, Michael Segel
>> >>> <michael_segel@hotmail.com>wrote:
>> >>>
>> >>>> You have to remember that HBase doesn't enforce any sort of typing.
>> >>>> That's why this can be difficult.
>> >>>>
>> >>>> You'd have to write a coprocessor to enforce a schema on a table.
>> >>>> Even then YMMV if you're writing JSON structures to a column because
>> >> while
>> >>>> the contents of the structures could be the same, the actual strings
>> >> could
>> >>>> differ.
>> >>>>
>> >>>> HTH
>> >>>>
>> >>>> -Mike
>> >>>>
>> >>>> On Jun 27, 2013, at 4:41 PM, Kristoffer Sjögren <stoffe@gmail.com>
>> >> wrote:
>> >>>>
>> >>>>> I realize standard comparators cannot solve this.
>> >>>>>
>> >>>>> However I do know the type of each column so writing custom
list
>> >>>>> comparators for boolean, char, byte, short, int, long, float,
double
>> >>>> seems
>> >>>>> quite straightforward.
>> >>>>>
>> >>>>> Long arrays, for example, are stored as a byte array with 8
bytes per
>> >>>> item
>> >>>>> so a comparator might look like this.
>> >>>>>
>> >>>>> public class LongsComparator extends WritableByteArrayComparable
{
>> >>>>>  public int compareTo(byte[] value, int offset, int length)
{
>> >>>>>      long[] values = BytesUtils.toLongs(value, offset, length);
>> >>>>>      for (long longValue : values) {
>> >>>>>          if (longValue == val) {
>> >>>>>              return 0;
>> >>>>>          }
>> >>>>>      }
>> >>>>>      return 1;
>> >>>>>  }
>> >>>>> }
>> >>>>>
>> >>>>> public static long[] toLongs(byte[] value, int offset, int length)
{
>> >>>>>  int num = (length - offset) / 8;
>> >>>>>  long[] values = new long[num];
>> >>>>>  for (int i = offset; i < num; i++) {
>> >>>>>      values[i] = getLong(value, i * 8);
>> >>>>>  }
>> >>>>>  return values;
>> >>>>> }
>> >>>>>
>> >>>>>
>> >>>>> Strings are similar but would require charset and length for
each
>> >> string.
>> >>>>>
>> >>>>> public class StringsComparator extends WritableByteArrayComparable
 {
>> >>>>>  public int compareTo(byte[] value, int offset, int length)
{
>> >>>>>      String[] values = BytesUtils.toStrings(value, offset, length);
>> >>>>>      for (String stringValue : values) {
>> >>>>>          if (val.equals(stringValue)) {
>> >>>>>              return 0;
>> >>>>>          }
>> >>>>>      }
>> >>>>>      return 1;
>> >>>>>  }
>> >>>>> }
>> >>>>>
>> >>>>> public static String[] toStrings(byte[] value, int offset, int
>> length)
>> >> {
>> >>>>>  ArrayList<String> values = new ArrayList<String>();
>> >>>>>  int idx = 0;
>> >>>>>  ByteBuffer buffer = ByteBuffer.wrap(value, offset, length);
>> >>>>>  while (idx < length) {
>> >>>>>      int size = buffer.getInt();
>> >>>>>      byte[] bytes = new byte[size];
>> >>>>>      buffer.get(bytes);
>> >>>>>      values.add(new String(bytes));
>> >>>>>      idx += 4 + size;
>> >>>>>  }
>> >>>>>  return values.toArray(new String[values.size()]);
>> >>>>> }
>> >>>>>
>> >>>>>
>> >>>>> Am I on the right track or maybe overlooking some implementation
>> >> details?
>> >>>>> Not really sure how to target each comparator to a specific
column
>> >> value?
>> >>>>>
>> >>>>>
>> >>>>> On Thu, Jun 27, 2013 at 9:21 PM, Michael Segel <
>> >>>> michael_segel@hotmail.com>wrote:
>> >>>>>
>> >>>>>> Not an easy task.
>> >>>>>>
>> >>>>>> You first need to determine how you want to store the data
within a
>> >>>> column
>> >>>>>> and/or apply a type constraint to a column.
>> >>>>>>
>> >>>>>> Even if you use JSON records to store your data within a
column,
>> does
>> >> an
>> >>>>>> equality comparator exist? If not, you would have to write
one.
>> >>>>>> (I kinda think that one may already exist...)
>> >>>>>>
>> >>>>>>
>> >>>>>> On Jun 27, 2013, at 12:59 PM, Kristoffer Sjögren <stoffe@gmail.com>
>> >>>> wrote:
>> >>>>>>
>> >>>>>>> Hi
>> >>>>>>>
>> >>>>>>> Working with the standard filtering mechanism to scan
rows that
>> have
>> >>>>>>> columns matching certain criterias.
>> >>>>>>>
>> >>>>>>> There are columns of numeric (integer and decimal) and
string
>> types.
>> >>>>>> These
>> >>>>>>> columns are single or multi-valued like "1", "2", "1,2,3",
"a", "b"
>> >> or
>> >>>>>>> "a,b,c" - not sure what the separator would be in the
case of list
>> >>>> types.
>> >>>>>>> Maybe none?
>> >>>>>>>
>> >>>>>>> I would like to compose the following queries to filter
out rows
>> that
>> >>>>>> does
>> >>>>>>> not match.
>> >>>>>>>
>> >>>>>>> - contains(String column, String value)
>> >>>>>>> Single valued column that String.contain() provided
value.
>> >>>>>>>
>> >>>>>>> - equal(String column, Object value)
>> >>>>>>> Single valued column that Object.equals() provided value.
>> >>>>>>> Value is either string or numeric type.
>> >>>>>>>
>> >>>>>>> - greaterThan(String column, java.lang.Number value)
>> >>>>>>> Single valued column that > provided numeric value.
>> >>>>>>>
>> >>>>>>> - in(String column, Object value...)
>> >>>>>>> Multi-valued column have values that Object.equals()
all provided
>> >>>>>> values.
>> >>>>>>> Values are of string or numeric type.
>> >>>>>>>
>> >>>>>>> How would I design a schema that can take advantage
of the already
>> >>>>>> existing
>> >>>>>>> filters and comparators to accomplish this?
>> >>>>>>>
>> >>>>>>> Already looked at the string and binary comparators
but fail to see
>> >> how
>> >>>>>> to
>> >>>>>>> solve this in a clean way for multi-valued column values.
>> >>>>>>>
>> >>>>>>> Im aware of custom filters but would like to avoid it
if possible.
>> >>>>>>>
>> >>>>>>> Cheers,
>> >>>>>>> -Kristoffer
>> >>>>>>
>> >>>>>>
>> >>>>
>> >>>>
>> >>
>> >>
>>

Mime
View raw message