hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kristoffer Sjögren <sto...@gmail.com>
Subject Re: Schema design for filters
Date Sat, 29 Jun 2013 11:29:53 GMT
In terms of scalability, yes, but we use HBase for other stuff aswell,
timeseries, counters and few future ideas around analytics. So its nice if
we can put everything in same deployment.

We dont want users to care about the physical storage (keep them productive
in Java land). The point here of being schemaless is to relieve users of
defining and administering the schema, types, sizes, indexes, queries etc
for every class. Write the class and you're done, no extra implementation
overhead, with a very simplistic query API that work on actual Java types,
nothing else.

Btw, I have already writting a schema less implementation in SQL and its
kinda painful to implement efficient WHERE queries for less than, greater
than if you dont know the target type. HBase's extendability and freedom is
actually quite amazing on this point.

I have done some prototyping on filters now (after looking at Phoenix) and
I think the implementation is quite straightforward. But I havent decided
to split fields to qualifiers or store the instance as a blob. Think im
leaning towards a custom binary format that is able to seek fields through
the blob efficiently.


On Sat, Jun 29, 2013 at 1:45 AM, Michel Segel <michael_segel@hotmail.com>wrote:

> This doesn't make sense in that the OP wants schema less  structure, yet
> wants filtering on columns. The issue is that you do have a limited Schema,
> so Schema less is a misnomer.
>
> In order to do filtering, you need to enforce object type within a column
> which requires a Schema to be enforced.
>
> Again, this can be done in HBase.
>
>
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Jun 28, 2013, at 4:30 PM, Asaf Mesika <asaf.mesika@gmail.com> wrote:
>
> > Yep. Other DBs like
> > Mongo may have the stuff you need out of the box.
> > Another option is to encode the whole class using Avro, and writing a
> > filter on top of that.
> > You basically use one column and store it there.
> > Yes, you pay the penalty of loading your entire class and extract the
> > fields you need to compare against, but I'm really not sure the other way
> > is faster, taking into account the hint mechanism in Filter which is
> > pinpointed thus grabs more bytes than it needs to.
> >
> > Back what was said earlier: 1M rows- why not MySql?
> >
> > On Friday, June 28, 2013, Otis Gospodnetic wrote:
> >
> >> Hi,
> >>
> >> I see.  Btw. isn't HBase for < 1M rows an overkill?
> >> Note that Lucene is schemaless and both Solr and Elasticsearch can
> >> detect field types, so in a way they are schemaless, too.
> >>
> >> Otis
> >> --
> >> Performance Monitoring -- http://sematext.com/spm
> >>
> >>
> >>
> >> On Fri, Jun 28, 2013 at 2:53 PM, Kristoffer Sjögren <stoffe@gmail.com>
> >> wrote:
> >>> @Otis
> >>>
> >>> HBase is a natural fit for my usecase because its schemaless. Im
> >> building a
> >>> configuration management system and there is no need for advanced
> >>> filtering/querying capabilities, just basic predicate logic and
> >> pagination
> >>> that scales to < 1 million rows with reasonable performance.
> >>>
> >>> Thanks for the tip!
> >>>
> >>>
> >>> On Fri, Jun 28, 2013 at 8:34 PM, Otis Gospodnetic <
> >>> otis.gospodnetic@gmail.com> wrote:
> >>>
> >>>> Kristoffer,
> >>>>
> >>>> You could also consider using something other than HBase, something
> >>>> that supports "secondary indices", like anything that is Lucene based
> >>>> - Solr and ElasticSearch for example.  We recently compared how we
> >>>> aggregate data in HBase (see my signature) and how we would do it if
> >>>> we were to use Solr (or ElasticSearch), and so far things look better
> >>>> in Solr for our use case.  And our use case involves a lot of
> >>>> filtering, slicing and dicing..... something to consider...
> >>>>
> >>>> Otis
> >>>> --
> >>>> Solr & ElasticSearch Support -- http://sematext.com/
> >>>> Performance Monitoring -- http://sematext.com/spm
> >>>>
> >>>>
> >>>>
> >>>> On Fri, Jun 28, 2013 at 5:24 AM, Kristoffer Sjögren <stoffe@gmail.com
> >
> >>>> wrote:
> >>>>> Interesting. Im actually building something similar.
> >>>>>
> >>>>> A fullblown SQL implementation is bit overkill for my particular
> >> usecase
> >>>>> and the query API is the final piece to the puzzle. But ill
> definitely
> >>>> have
> >>>>> a look for some inspiration.
> >>>>>
> >>>>> Thanks!
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Fri, Jun 28, 2013 at 3:55 AM, James Taylor <
> jtaylor@salesforce.com
> >>>>> wrote:
> >>>>>
> >>>>>> Hi Kristoffer,
> >>>>>> Have you had a look at Phoenix (
> >> https://github.com/forcedotcom/phoenix
> >>>> )?
> >>>>>> You could model your schema much like an O/R mapper and issue
SQL
> >>>> queries
> >>>>>> through Phoenix for your filtering.
> >>>>>>
> >>>>>> James
> >>>>>> @JamesPlusPlus
> >>>>>> http://phoenix-hbase.blogspot.com
> >>>>>>
> >>>>>> On Jun 27, 2013, at 4:39 PM, "Kristoffer Sjögren" <stoffe@gmail.com
> >
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Thanks for your help Mike. Much appreciated.
> >>>>>>>
> >>>>>>> I dont store rows/columns in JSON format. The schema is
exactly
> >> that
> >>>> of a
> >>>>>>> specific java class, where the rowkey is a unique object
identifier
> >>>> with
> >>>>>>> the class type encoded into it. Columns are the field names
of the
> >>>> class
> >>>>>>> and the values are that of the object instance.
> >>>>>>>
> >>>>>>> Did think about coprocessors but the schema is discovered
a runtime
> >>>> and I
> >>>>>>> cant hard code it.
> >>>>>>>
> >>>>>>> However, I still believe that filters might work. Had a
look
> >>>>>>> at SingleColumnValueFilter and this filter is be able to
target
> >>>> specific
> >>>>>>> column qualifiers with specific WritableByteArrayComparables.
> >>>>>>>
> >>>>>>> But list comparators are still missing... So I guess the
only way
> >> is
> >>>> to
> >>>>>>> write these comparators?
> >>>>>>>
> >>>>>>> Do you follow my reasoning? Will it work?
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Fri, Jun 28, 2013 at 12:58 AM, Michael Segel
> >>>>>>> <
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message