incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Williams (JIRA)" <>
Subject [jira] [Commented] (BLUR-445) Remove online mutates from the Blur thrift api
Date Wed, 11 Nov 2015 14:41:10 GMT


Tim Williams commented on BLUR-445:

Can you clarify  "move all index mutations to the bulk indexing approach"?  There are at least
four different bulk-ish approaches (m/r, batch, hive, enqueue) - so what does *the* bulk indexing
approach mean in this context?   I reckon, you'll want to share more about the new daemon
too... is that similar to what supports hive-style indexing?

> Remove online mutates from the Blur thrift api
> ----------------------------------------------
>                 Key: BLUR-445
>                 URL:
>             Project: Apache Blur
>          Issue Type: Improvement
>          Components: Blur
>    Affects Versions: 0.3.0
>            Reporter: Aaron McCurry
>             Fix For: 0.3.0
> The primary use case for Blur is for massive ingestion of information to be indexed and
searched.  Currently I believe the system has been made overly complex due to the atomic operations
in the online index mutation system.  It forces the shard servers to have writers open to
each of the indexes in the given table, this requires a lot of memory, cpu, and file resources
per shard.
> Currently the system only allows for mutates to be atomic when mutating a single row.
 Batch mutates are not atomic.
> I propose that we move all index mutations to the bulk indexing approach and utilize
hdfs snapshots for commiting index information within a given table.  This will allow the
controller and shard servers to become readonly with respect to the indexes.
> Assuming we move forward with this approach a new daemon will need to created, and index
manager.  This daemon will coordinate indexing (MR, Spark, Tez, Flink, etc) and merging globally
for the cluster.

This message was sent by Atlassian JIRA

View raw message