hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashu Pachauri (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14777) Replication fails with IndexOutOfBoundsException
Date Wed, 18 Nov 2015 06:25:11 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15010336#comment-15010336
] 

Ashu Pachauri commented on HBASE-14777:
---------------------------------------

[~appy] 
{quote}
why are we returning index and iterating reverse and all. Can't we simply remove first element
every time given the fact we are going in order and blocking on futures. Such a simple logic
would have avoided the bug in first place.
{quote}
This is a perfectly valid question, and I also had the same doubt, we can totally make it
work without the use of ordinals. But, removing the first element always won't work. If you
look at the code more closely:
{code}
for (Future<Integer> f : futures) {
          try {
            // wait for all futures, remove successful parts
            // (only the remaining parts will be retried)
            entryLists.remove(f.get());
          } catch (InterruptedException ie) {
            iox =  new IOException(ie);
          } catch (ExecutionException ee) {
            // cause must be an IOException
            iox = (IOException)ee.getCause();
          }
        }
{code}
Suppose, we always remove the first element. If an InterruptedException/ExecutionException
is thrown at index 0, we will not remove the element. Now, we don't want to retry, we want
to check other Futures before we retry. So, we move to index 1, check the future and remove
an entry if it succeeds. Now, which element would you remove, index 0 or index 1? To make
this work, at any point in the loop, you will have to keep track of how many futures failed
before this future in the list (the index to remove is offset by that number from index 0).
Now, this is perfectly simple to do, just keep a running counter of successful futures. But,
what is already being done here is also perfectly simple. Also, having ordinals gives you
other guarantees that you are referring to the correct Entry in the list, if code became more
complicated in the future.

{quote}
Why change utility cluster from 2 to 4? If it is a non-trivial reason, please add a comment
for the same.
{quote}
Yes, because InterClusterReplicationEndpoint uses number of sinks as a component in deciding
the batches for shipping. I thought having at most two batches was not much of a test. I will
add a comment for the same.

> Replication fails with IndexOutOfBoundsException
> ------------------------------------------------
>
>                 Key: HBASE-14777
>                 URL: https://issues.apache.org/jira/browse/HBASE-14777
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>            Reporter: Bhupendra Kumar Jain
>            Assignee: Bhupendra Kumar Jain
>            Priority: Critical
>             Fix For: 2.0.0, 1.2.0, 1.3.0
>
>         Attachments: HBASE-14777-1.patch, HBASE-14777-2.patch, HBASE-14777.patch
>
>
> Replication fails with IndexOutOfBoundsException 
> {code}
> regionserver.ReplicationSource$ReplicationSourceWorkerThread(939): org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint
threw unknown exception:java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
> 	at java.util.ArrayList.rangeCheck(Unknown Source)
> 	at java.util.ArrayList.remove(Unknown Source)
> 	at org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.replicate(HBaseInterClusterReplicationEndpoint.java:222)
> {code}
> Its happening due to incorrect removal of entries from the replication entries list.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message