accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edmon Begoli <ebeg...@gmail.com>
Subject Re: Python client lib for Accumulo?
Date Fri, 27 Jul 2012 03:15:28 GMT
Hi folks,

I have just joined the list with the purpose of volunteering ideas,
design and development (and whatever else in lifecycle)
related to development of the Python client for accumulo.

I have developed several RESTful clients and libraries before using
web.py and I am about to write another in Tornado
(http://www.tornadoweb.org/).

I think that we could have a very nice, scalable and fast RESTful API
for Accumulo through Tornado.

I would also like to develop pure Python library for accumulo similar
to HappyBase for HBase (https://github.com/wbolster/happybase).

I work at Oak Ridge National Lab as a software engineer and tech. lead
on "big data" projects,
I can devote time, possibly bring more team members and I would be
happy to collaborate. Collaborations are welcome.

I could certainly start a small wiki outlining the ideas and open them
for discussion.

Regards and please advise,
Edmon


On Wed, May 2, 2012 at 11:31 AM, Jason Trost <jason.trost@gmail.com> wrote:
> I noticed that there are no JIRAs for a python client
> interface/lib/API for Accumulo.  How involved would it be to develop
> AND maintain a python client for Accumulo?
>
> I realize that Jython can be used, but I am interested in a native
> python lib that can be use more broadly with systems that don't work
> with Jython.
>
> In order to do this, it seems like we would need to:
> 1. generate the python thrift bindings code (this is trivial)
> 2. develop and maintain the python glue code to use the thrift code
> and python zookeeper code to interact with the various accumulo
> components.  The current Java "glue" code looks quite long.  How often
> does this code change (in terms of new features or changes in
> protocol, not bug fixes)?
>

I would advise against rewriting the accumulo client code in python.
The code that finds tablets, retries in case of failure, parallelizes
read/writes, etc is fairly complex.  I think the proxy option is best.
 David and Eric mentioned REST and Thrift proxies.

If we were to go to down the route of writing the client code in
another language, I think C++ with a C API would be the best option
because many language can easily bind to a C API.

> Ideally the python API would be very similar to the Java interface
> (Connector, Instance, Scanner, BatchScanner, BatchWriter, Key, Value,
> Mutation, etc).
>
> I guess what I am trying to get at is, does the Accumulo dev community
> think it's worth the time and effort to develop and maintain a python
> API?  I personally think it is in order to help with adoption and
> integration with other systems (Django is the primary system I want to
> be able to use with it).  I have some time to help this along, but I
> don't think I have enough time to take this on alone.  Is anyone else
> interested in working together on this?
>
> Thanks,
>
> --Jason

Mime
View raw message