accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Cosmos - Accumulo-backed sorting, filtering and grouping of columnar data sets
Date Tue, 03 Sep 2013 02:30:34 GMT
Since this is the community that's likely to be interested, I wanted to 
spread some word about a project I've been working on in my spare time: 
Cosmos.

https://github.com/joshelser/cosmos

The point of Cosmos is to provide an efficient, easy-to-use interface 
around Accumulo for the general purpose of counting and filtering of a 
data set. At a glance, it accepts Multimaps of data, and provides 
mechanism to fetch records by column, fetch records by column with value 
filtering, and count unique values across records in a column (groupBy). 
It also contains a very simple internal timing/tracing API (much less 
granular than Accumulo's tracing library), and a (very) rough web 
interface for viewing said traces. Additionally, Cosmos contains a 
simple example of its API using a public dataset of ~350K records 
provided by the city of Chicago (https://data.cityofchicago.org/).

Cosmos' design lends itself well to multiple users accessing the same 
Accumulo instance, deferring to Accumulo or ZooKeeper to do 
synchronization/persistence when necessary. It aims at abstracting some 
of the difficulty in using Accumulo away from the user to make the 
application developer's life a bit easier.

And, as you'd expect, Apache licensed and compatible with Apache 
Accumulo 1.4.4 and 1.5.0.

I'd love to hear what people think. Any feedback is welcome.

- Josh

Mime
View raw message