accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Clint Green <>
Subject Re: Accumulo Data Storage Efficiency
Date Thu, 12 Jul 2012 15:08:34 GMT
Could you use Culvert to control the indexing across platforms?

On Thu, Jul 12, 2012 at 8:57 AM, William Slacum <> wrote:

> It'd be nice to see some numbers, but I also think it's important to
> account for use cases. Doing secondary indexing on records/files,
> metadata extraction and document storage will increase the raw storage
> required by some factor. Then, it's all compressed in various ways
> (ie, at the RFile level, at the HDFS block level)!
> Could we try to define some rudimentary structure that we'd put the
> data in? Like just create a term index on it, since I know HBase and
> Cassandra should be able to handle that.
> On Thu, Jul 12, 2012 at 6:42 AM, David Medinets
> <> wrote:
> > Are there any published numbers for the amount of disk space used by
> > Accumulo versus other products? I'm thinking some dataset like dbpedia
> > or something from If there is
> > not such a comparison, what comparisons would you like to see? What
> > about WordNet stored in CSV, MySQL, Cassandra, HBase, and Accumulo?
> > WordNet is just a large set of CSV files so it would be a good
> > candidate for this concept, I think.

View raw message