accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Fuchs <afu...@apache.org>
Subject Re: Document Partitioned Indexing
Date Wed, 30 Sep 2015 15:27:10 GMT
Hi Tom,

Sqrrl uses a document-distributed indexing strategy extensively. On top of
the reasons you mentioned, we also like the ability to explicitly structure
our index entries in both information content and sort order. This gives us
the ability to do interesting things like build custom indexes and do joins
between graph indexes and term indexes.

Eventually, I'd like to see Accumulo build out explicit support for this
type of indexing in the core as an embedded secondary indexing capability.
That would solve several of the challenges around compatibility with other
Accumulo features and usage patterns.

Cheers,
Adam


On Wed, Sep 30, 2015 at 3:48 AM, Tom D <tomdata8@gmail.com> wrote:

> Hi,
>
> Have been doing a little reading about different distributed (text)
> indexing techniques and picked up on the Document Partitioned Index
> approach on Accumulo.
>
> I am interested in the use-cases people would have for indexing data in
> this way over using a distributed search service (Elastic or SolrCloud).
>
> I can think of a few reasons, but wondered if there's something more
> obvious that I'm missing?
>
> - cell (field level) access controls
>
> - scale - I understand Accumulo will scale to thousands of nodes. I
> believe there are some limitations in Elastic / Solr at about 100 nodes.
>
> - integration with an existing schema or index in Accumulo (not sure about
> this one and what benefits it would have over calling out to a search
> service)
>
> - you want to take advantage of other features in Accumulo, e.g. Combining
> iterators to perform some aggregation alongside your document partitioned
> index (again, can't imagine use cases here, but maybe there are some)
>
> - more control over 'messy data', e.g partial duplicates that need merging
> at ingest
>
> Are there others? Be interesting to hear if people use this indexing
> strategy.
>
> Many thanks.
>
>
>

Mime
View raw message