accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Newton (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (ACCUMULO-1417) data storage efficiency
Date Fri, 18 Jul 2014 04:00:08 GMT

     [ https://issues.apache.org/jira/browse/ACCUMULO-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eric Newton resolved ACCUMULO-1417.
-----------------------------------

    Resolution: Fixed

Code to ingest the Google Books ngrams was added.  I posted some numbers on the efficiency
of the ingest and storage [here|http://tinyurl.com/nrvj7xv].

Other key-value stores can compare their numbers, if they like.  Beating compressed CSV's
was an unexpected result.


> data storage efficiency
> -----------------------
>
>                 Key: ACCUMULO-1417
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1417
>             Project: Accumulo
>          Issue Type: Task
>            Reporter: Eric Newton
>
> David Medinets wrote the user's list:
> {quote}
> Are there any published numbers for the amount of disk space used by
> Accumulo versus other products? I'm thinking some dataset like dbpedia
> or something from http://books.google.com/ngrams/datasets. If there is
> not such a comparison, what comparisons would you like to see? What
> about WordNet stored in CSV, MySQL, Cassandra, HBase, and Accumulo?
> WordNet is just a large set of CSV files so it would be a good
> candidate for this concept, I think.
> {quote}
> Good idea.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message