metamodel-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kasper Sørensen <i.am.kasper.soren...@gmail.com>
Subject Re: [DISCUSS] State of the work-in-progress HBase branch
Date Tue, 28 Jan 2014 20:58:42 GMT
Regarding point no. 4 ... I was just investigating and tried making a
"thinner" HBase client simply by adding Maven <exclude>s to the
hbase-client dependency. I eventually came up with this quite long list of
excludes that at least do not affect our (tested) usage of HBase:

<exclusion>
<artifactId>log4j</artifactId>
<groupId>log4j</groupId>
</exclusion>
<exclusion>
<artifactId>commons-logging</artifactId>
<groupId>commons-logging</groupId>
</exclusion>
<exclusion>
<artifactId>netty</artifactId>
<groupId>io.netty</groupId>
</exclusion>
<exclusion>
<artifactId>jersey-json</artifactId>
<groupId>com.sun.jersey</groupId>
</exclusion>
<exclusion>
<artifactId>jersey-server</artifactId>
<groupId>com.sun.jersey</groupId>
</exclusion>
<exclusion>
<artifactId>jersey-core</artifactId>
<groupId>com.sun.jersey</groupId>
</exclusion>
<exclusion>
<artifactId>jackson-mapper-asl</artifactId>
<groupId>org.codehaus.jackson</groupId>
</exclusion>
<exclusion>
<artifactId>jsp-2.1</artifactId>
<groupId>org.mortbay.jetty</groupId>
</exclusion>
<exclusion>
<artifactId>jsp-api-2.1</artifactId>
<groupId>org.mortbay.jetty</groupId>
</exclusion>
<exclusion>
<artifactId>jasper-compiler</artifactId>
<groupId>tomcat</groupId>
</exclusion>
<exclusion>
<artifactId>jasper-runtime</artifactId>
<groupId>tomcat</groupId>
</exclusion>
<exclusion>
<artifactId>jetty-util</artifactId>
<groupId>org.mortbay.jetty</groupId>
</exclusion>
<exclusion>
<artifactId>jetty</artifactId>
<groupId>org.mortbay.jetty</groupId>
</exclusion>
<exclusion>
<artifactId>commons-httpclient</artifactId>
<groupId>commons-httpclient</groupId>
</exclusion>
<exclusion>
<artifactId>findbugs-annotations</artifactId>
<groupId>com.github.stephenc.findbugs</groupId>
</exclusion>
<exclusion>
<artifactId>commons-cli</artifactId>
<groupId>commons-cli</groupId>
</exclusion>
<exclusion>
<artifactId>commons-el</artifactId>
<groupId>commons-el</groupId>
</exclusion>
<exclusion>
<artifactId>commons-net</artifactId>
<groupId>commons-net</groupId>
</exclusion>
<exclusion>
<artifactId>xmlenc</artifactId>
<groupId>xmlenc</groupId>
</exclusion>
<exclusion>
<artifactId>commons-math</artifactId>
<groupId>org.apache.commons</groupId>
</exclusion>
<exclusion>
<artifactId>jsr305</artifactId>
<groupId>com.google.code.findbugs</groupId>
</exclusion>

Quite a long list ... I'm not feeling super happy to commit this, but it
seems the best option to use the native HBase client and with these
exclusions it is at least trimmed down to just the dependencies that we
actually need.


2014-01-27 Henry Saputra <henry.saputra@gmail.com>

> Kasper, sorry typo =)
>
> On Mon, Jan 27, 2014 at 1:07 PM, Henry Saputra <henry.saputra@gmail.com>
> wrote:
> > Sorry Kapser, a bit busy and hectic with my schedule so I have punt my
> > response later. Apologize about the delay.
> >
> > - Henry
> >
> > On Mon, Jan 27, 2014 at 12:18 PM, Kasper Sørensen
> > <i.am.kasper.sorensen@gmail.com> wrote:
> >> OK to kick things off, let me provide my own input for this discussion.
> >> Please find below my thoughts on the issues and what we need to do. Your
> >> feedback is very very welcome.
> >>
> >>
> >> 2014-01-24 Kasper Sørensen <i.am.kasper.sorensen@gmail.com>
> >>
> >>> Hi everyone,
> >>>
> >>> I was looking at our "hbase-module" branch and as much as I like this
> >>> idea, I think we've been a bit too idle with the branch. Maybe we
> should
> >>> try to make something final e.g. for a version 4.1.
> >>>
> >>> So I thought to give an overview/status of the module's current
> >>> capabilities and it's shortcomings. We should figure out if we think
> this
> >>> is good enough for a first version, or if we want to do some
> improvements
> >>> to the module before adding it to our portfolio of MetaModel modules.
> >>>
> >>> 1) The module only offers read-only/query access to HBase. That is in
> my
> >>> opinion OK for now, we have several such modules, and this is
> something we
> >>> can better add later if we straighten out the remaining topics in this
> mail.
> >>>
> >>
> >> No problem
> >>
> >>
> >>> 2) With regards to metadata mapping: HBase is different because it has
> >>> both column families and in column families there are columns. For the
> sake
> >>> of our view on HBase I would describe column families simply as "a
> logical
> >>> of columns". Column families are fixed within a table, but rows in a
> table
> >>> may contain arbitrary numbers of columns within each column family.
> So...
> >>> You can instantiate the HBaseDataContext in two ways:
> >>>
> >>> 2a) You can let MetaModel discover the metadata. This unfortunately
> has a
> >>> severe limitation. We discover the table names and column families
> using
> >>> the HBase API. But the actual columns and their contents cannot be
> provided
> >>> by the API. So instead we simply expose the column families with a MAP
> data
> >>> types. The trouble with this is that the keys and values of the maps
> will
> >>> simply be byte-arrays ... Usually not very useful! But it's sort of the
> >>> only thing (as far as I can see) that's "safe" in HBase, since HBase
> allows
> >>> anything (byte arrays) in it's columns.
> >>>
> >>
> >> I think we could maybe add a flag here to allow MetaModel to assume that
> >> column keys are of String type. That would at least make the discovered
> >> metadata more meaningful since we can expose columns and not just column
> >> families. It's still going to be tough to figure out the value types,
> but
> >> we could e.g. make the Column implementations mutable and allow setting
> >> ColumnType on a "live" HBaseColumn.
> >>
> >>
> >>> 2b) Like in e.g. MongoDb or CouchDb modules you can provide an array of
> >>> tables (SimpleTableDef). That way the user defines the metadata
> himself and
> >>> the implementation assumes that it is correct (or else it will break).
> The
> >>> good thing about this is that the user can define the proper data types
> >>> etc. for columns. The user defines the column family and column name by
> >>> setting defining the MetaModel column name as this: "family:name"
> >>> (consistent with most HBase tools and API calls).
> >>>
> >>
> >> This is good, but requires more of the user.
> >>
> >>
> >>> 3) With regards to querying: We've implemented basic query capabilities
> >>> using the MetaModel query postprocessor. But not all queries are very
> >>> effective... In addition to of course full table scans, we have
> optimized
> >>> support of of COUNT queries and of table scans with maxRows.
> >>>
> >>> We could rather easily add optimized support for a couple of other
> typical
> >>> queries:
> >>>  * lookup record by ID
> >>>  * paged table scans (both firstRow and maxRows)
> >>>  * queries with simple filters/where items
> >>>
> >>
> >> I think "lookup record by ID" is a MUST, since this is a whole other
> class
> >> of queries in HBase (Get instead of Scan).
> >>
> >> Other optimizations would be nice too, but for the usage I have I could
> >> live without it in the first release.
> >>
> >>
> >>> 4) With regards to dependencies: The module right now depends on the
> >>> artifact called "hbase-client". This dependency has a loot of transient
> >>> dependencies so the size of the module is quite extreme. As an
> example, it
> >>> includes stuff like jetty, jersey, jackson and of course hadoop... But
> I am
> >>> wondering if we can have a more thin client-side than that! If anyone
> knows
> >>> if e.g. we can use the REST interface easily or so, that would maybe be
> >>> better. I'm not an expert on HBase though, so please enlighten me!
> >>>
> >>
> >> This is a big problem IMO. Anyone with HBase client experience? Would
> be a
> >> lot better with a thin client somehow.
> >>
> >>
> >>> Kind regards,
> >>> Kasper
> >>>
> >>>
> >>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message