incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron McCurry (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BLUR-74) Make the disabling and enabling of tables blocking calls.
Date Sun, 21 Apr 2013 19:03:15 GMT

    [ https://issues.apache.org/jira/browse/BLUR-74?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13637620#comment-13637620
] 

Aaron McCurry commented on BLUR-74:
-----------------------------------

Let me discuss how we got here.  In earlier versions of Blur, the index locks (Lucene API
LockFactory) were actually controlled by ZooKeeper.  This made a lot of sense when I wrote
it. Basically there was an ephemeral node for per shard per table.  When a failure was detected
and shards were relocated, it was assumed that the ephemeral nodes would have been released
(been removed by ZK) by the node that went offline.  And thus the locks would have been released,
and the server that was opening the shard would be able to obtain the lock immediately and
start the opening process by the writer.  In that implementation the waiting for the table
to enable or disable was a matter of waiting for the ephemeral nodes (the locks) to be present
or not.

However in practice it did not work that well, the problem was that in running a large cluster
where there are thousands of shards ZK would not react that fast to individual ephemeral nodes.
 And the result was during a failure the server trying to open the down shard would wait for
seconds to minutes to obtain the lock to start opening the index.  So the ZK lockfactory was
replaced with a HDFS versus that allows for any writer to obtain the lock however it validates
that the writer that the writer has the lock before committing any new data to the index.

So the problem is that currently we really don't have idea what shards are actually open on
any given server.  We only know what shards the "should" be open, and that may be the answer.
 Perhaps we should add a another call in Blur service in thrift and extend the "shardServerLayout"
method behavior.  We should leave the existing call and it's behavior in place and add a another
"shardServerLayout" method that takes a parameter maybe an enum of ACTUAL and CALCULATED.
 Where the CALCULATED is the current result and ACTUAL what is really open.  Then we can have
the enable and disable calls key off the results of that call and block appropriately.

Aaron
                
> Make the disabling and enabling of tables blocking calls.
> ---------------------------------------------------------
>
>                 Key: BLUR-74
>                 URL: https://issues.apache.org/jira/browse/BLUR-74
>             Project: Apache Blur
>          Issue Type: Bug
>    Affects Versions: 0.1.5
>            Reporter: Aaron McCurry
>             Fix For: 0.1.5
>
>
> Currently the calls return, and then the action is carried out asynchronously.  This
is an issue with the writers when someone calls disable and remove very quickly and the indexes
are to be removed.  Because the indexes are deleted out form underneath the writers.  This
causes the shard servers to throw errors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message