accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom D <>
Subject Document Partitioned Indexing
Date Wed, 30 Sep 2015 07:48:44 GMT

Have been doing a little reading about different distributed (text)
indexing techniques and picked up on the Document Partitioned Index
approach on Accumulo.

I am interested in the use-cases people would have for indexing data in
this way over using a distributed search service (Elastic or SolrCloud).

I can think of a few reasons, but wondered if there's something more
obvious that I'm missing?

- cell (field level) access controls

- scale - I understand Accumulo will scale to thousands of nodes. I believe
there are some limitations in Elastic / Solr at about 100 nodes.

- integration with an existing schema or index in Accumulo (not sure about
this one and what benefits it would have over calling out to a search

- you want to take advantage of other features in Accumulo, e.g. Combining
iterators to perform some aggregation alongside your document partitioned
index (again, can't imagine use cases here, but maybe there are some)

- more control over 'messy data', e.g partial duplicates that need merging
at ingest

Are there others? Be interesting to hear if people use this indexing

Many thanks.

View raw message