metamodel-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kasper Sørensen <i.am.kasper.soren...@gmail.com>
Subject Re: [DISCUSS] State of the work-in-progress HBase branch
Date Mon, 24 Feb 2014 16:08:16 GMT
Hi Henry,

Yea the Phoenix project is definately an interesting approach to making MM
capable of working with HBase. The only downside to me is that it seems
they do a lot of intrusive stuff to HBase like creating new index tables
etc... I would normally not "allow" that for a simple connector.

Maybe we should simply support both styles. And in the case of Phoenix, I
guess we could simply go through the JDBC module of MetaModel and connect
via their JDBC driver... Is that maybe a route, do you know?

- Kasper


2014-02-24 6:37 GMT+01:00 Henry Saputra <henry.saputra@gmail.com>:

> We could use the HBase client library from the store I suppose.
> The issue I am actually worry is actually adding real query support
> for column based datastore is kind of big task.
> Apache Phoenix tried to do that so maybe we could leverage the SQL
> planner layer to provide the implementation of the query execution to
> HBase layer?
>
> - Henry
>
>
> On Mon, Feb 17, 2014 at 9:33 AM, Kasper Sørensen
> <i.am.kasper.sorensen@gmail.com> wrote:
> > Thanks for the input Henry. With your experience, do you then also happen
> > to know of a good thin client-side library? I imagine that we could maybe
> > use a REST client instead of the full client we currently use. That would
> > save us a ton of dependency-overhead I think. Or is it a non-issue in
> your
> > mind, since HBase users are used to this overhead?
> >
> >
> > 2014-02-16 7:16 GMT+01:00 Henry Saputra <henry.saputra@gmail.com>:
> >
> >> For 1 > I think adding read only to HBase should be ok because most
> >> update to HBase either through HBase client or REST via Stargate [1]
> >> or Thrift
> >>
> >> For 2 > In Apache Gora we use Avro to do type mapping to column and
> >> generate POJO java via Avro compiler.
> >>
> >> For 3 > This is the one I am kinda torn. Apache Phoenix incubating try
> >> to provide SQL to HBase [2] via extra indexing and caching. I think
> >> this is defeat the purpose of having NoSQL databases that serve
> >> different purpose than Relational databse.
> >>
> >> I am not sure Metamodel should touch NoSQL databases which more like
> >> column types. These databases are designed for large data with access
> >> primary via key and not query mechanism.
> >>
> >> Just my 2-cent
> >>
> >>
> >> [1] http://wiki.apache.org/hadoop/Hbase/Stargate
> >> [2] http://phoenix.incubator.apache.org/
> >>
> >> On Fri, Jan 24, 2014 at 11:35 AM, Kasper Sørensen
> >> <i.am.kasper.sorensen@gmail.com> wrote:
> >> > Hi everyone,
> >> >
> >> > I was looking at our "hbase-module" branch and as much as I like this
> >> idea,
> >> > I think we've been a bit too idle with the branch. Maybe we should
> try to
> >> > make something final e.g. for a version 4.1.
> >> >
> >> > So I thought to give an overview/status of the module's current
> >> > capabilities and it's shortcomings. We should figure out if we think
> this
> >> > is good enough for a first version, or if we want to do some
> improvements
> >> > to the module before adding it to our portfolio of MetaModel modules.
> >> >
> >> > 1) The module only offers read-only/query access to HBase. That is in
> my
> >> > opinion OK for now, we have several such modules, and this is
> something
> >> we
> >> > can better add later if we straighten out the remaining topics in this
> >> mail.
> >> >
> >> > 2) With regards to metadata mapping: HBase is different because it has
> >> both
> >> > column families and in column families there are columns. For the
> sake of
> >> > our view on HBase I would describe column families simply as "a
> logical
> >> of
> >> > columns". Column families are fixed within a table, but rows in a
> table
> >> may
> >> > contain arbitrary numbers of columns within each column family. So...
> You
> >> > can instantiate the HBaseDataContext in two ways:
> >> >
> >> > 2a) You can let MetaModel discover the metadata. This unfortunately
> has a
> >> > severe limitation. We discover the table names and column families
> using
> >> > the HBase API. But the actual columns and their contents cannot be
> >> provided
> >> > by the API. So instead we simply expose the column families with a MAP
> >> data
> >> > types. The trouble with this is that the keys and values of the maps
> will
> >> > simply be byte-arrays ... Usually not very useful! But it's sort of
> the
> >> > only thing (as far as I can see) that's "safe" in HBase, since HBase
> >> allows
> >> > anything (byte arrays) in it's columns.
> >> >
> >> > 2b) Like in e.g. MongoDb or CouchDb modules you can provide an array
> of
> >> > tables (SimpleTableDef). That way the user defines the metadata
> himself
> >> and
> >> > the implementation assumes that it is correct (or else it will break).
> >> The
> >> > good thing about this is that the user can define the proper data
> types
> >> > etc. for columns. The user defines the column family and column name
> by
> >> > setting defining the MetaModel column name as this: "family:name"
> >> > (consistent with most HBase tools and API calls).
> >> >
> >> > 3) With regards to querying: We've implemented basic query
> capabilities
> >> > using the MetaModel query postprocessor. But not all queries are very
> >> > effective... In addition to of course full table scans, we have
> optimized
> >> > support of of COUNT queries and of table scans with maxRows.
> >> >
> >> > We could rather easily add optimized support for a couple of other
> >> typical
> >> > queries:
> >> >  * lookup record by ID
> >> >  * paged table scans (both firstRow and maxRows)
> >> >  * queries with simple filters/where items
> >> >
> >> > 4) With regards to dependencies: The module right now depends on the
> >> > artifact called "hbase-client". This dependency has a loot of
> transient
> >> > dependencies so the size of the module is quite extreme. As an
> example,
> >> it
> >> > includes stuff like jetty, jersey, jackson and of course hadoop...
> But I
> >> am
> >> > wondering if we can have a more thin client-side than that! If anyone
> >> knows
> >> > if e.g. we can use the REST interface easily or so, that would maybe
> be
> >> > better. I'm not an expert on HBase though, so please enlighten me!
> >> >
> >> > Kind regards,
> >> > Kasper
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message