accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Slacum <wsla...@gmail.com>
Subject Re: Accumulo Data Storage Efficiency
Date Thu, 12 Jul 2012 12:57:04 GMT
It'd be nice to see some numbers, but I also think it's important to
account for use cases. Doing secondary indexing on records/files,
metadata extraction and document storage will increase the raw storage
required by some factor. Then, it's all compressed in various ways
(ie, at the RFile level, at the HDFS block level)!

Could we try to define some rudimentary structure that we'd put the
data in? Like just create a term index on it, since I know HBase and
Cassandra should be able to handle that.

On Thu, Jul 12, 2012 at 6:42 AM, David Medinets
<david.medinets@gmail.com> wrote:
> Are there any published numbers for the amount of disk space used by
> Accumulo versus other products? I'm thinking some dataset like dbpedia
> or something from http://books.google.com/ngrams/datasets. If there is
> not such a comparison, what comparisons would you like to see? What
> about WordNet stored in CSV, MySQL, Cassandra, HBase, and Accumulo?
> WordNet is just a large set of CSV files so it would be a good
> candidate for this concept, I think.

Mime
View raw message