incubator-hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (Commented) (JIRA)" <>
Subject [jira] [Commented] (HAMA-544) Create InputFormats/OutputFormats for HBase
Date Mon, 16 Apr 2012 20:30:24 GMT


Keith Turner commented on HAMA-544:

I am an Accumulo developer and I was looking at ACCUMULO-532.  I was thinking about the problem
of where this code should be put.   It seems like there are three options.

 # Put hama input/output formats in Hama
 # Put hama input/output formats in cassandra, accumulo, hbase, ..., etc
 # Put hama input/output formats on something like github

I am not sure what the best option is, we are trying to figure that out.  Below are some thoughts
about this issue.

Option 3 may not be a blessed apache way of doing business.  

One thing nice about option 3, is that if accumulo-hama has a serious bug that it can release
immediately w/o waiting.  For options 1 and 2, either hama or accumulo must release to fix
a serious hama-accumulo bug.  Options 3 may also make it easier to use a newer version of
hama w/ an older version of accumulo.  If accumulo ships with hama, and you have an older
version of accumulo it probably depends on an older version of hama.  This may not be an issue
if the hama API is really stable.

With option 3, Accumulo could include a hama jar in contrib w/ a link to github.  

>From the perspective of Accumulo, we have to work this same issue out for lots of projects.
 For example accumulo could ship with connectors for pig, hive, gora, hama, etc.  This increases
the # of dependencies that accumulo has that users may not need.  Currently the accumulo pig
adapter is on github.

Apache Gora seems to be doing option 1.  They have gora-core, gora-accumulo, gora-hbase, etc.
 Each one of these are maven sub projects of Gora.  One nice thing about this for the gora
case is that all of the gora stores can share test code. For example gora-accumlo extends
a test class from gora-core for testing.

I suppose option 1 is bad because hama is a subsystem like map reduce.  For example gora and
pig depend on map reduce, and its probably ok to make them depend on hbase or accumulo.  However,
you would not want map reduce or hama to depend on a certain version of accumulo or hbase.
 If accumulo-1.3 jars were in the map reduce system lib dir, I do not think user accumulo-1.4
jars can override those. I suspect the same is true for hama, which is why option 1 is bad?

I wrote the gora-accumulo backend and then I wrote goraci (
 To make goraci easy to run I wrote a slightly complex script and pom.  Maybe I was being
a bit OCD, but when I ran goraci against accumulo I only wanted the jars that were needed
on the classpath.  For example, I did not want hbase jars when running goraci against accumulo
and visa versa.  So what steps would the user have to go through to have hama read from accumulo
w/ options 2 vs 3? Seems like the main diff is add one extra jar to the classpath?  Is there
any other burden?  Its nice to make things as easy as possible for the user.
> Create InputFormats/OutputFormats for HBase
> -------------------------------------------
>                 Key: HAMA-544
>                 URL:
>             Project: Hama
>          Issue Type: Sub-task
>          Components: bsp
>    Affects Versions: 0.5.0
>            Reporter: praveen sripati
>            Assignee: praveen sripati
>            Priority: Minor
>             Fix For: 0.6.0
> Create InputFormats/OutputFormats for HBase

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message