hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vasu Mariyala (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
Date Fri, 16 Aug 2013 16:34:50 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13742367#comment-13742367
] 

Vasu Mariyala commented on HBASE-7709:
--------------------------------------

I ran all the test cases on my local machine with the trunk patch and they are successful.
But everytime it is run on jenkins, it throws 

FATAL: Unable to delete script file /tmp/hudson5964600500647866956.sh
hudson.util.IOException2: remote file operation failed: /tmp/hudson5964600500647866956.sh
at hudson.remoting.Channel@5ce45886:hadoop1
	at hudson.FilePath.act(FilePath.java:902)
	at hudson.FilePath.act(FilePath.java:879)
	at hudson.FilePath.delete(FilePath.java:1288)
	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:101)
	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60)
	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804)
	at hudson.model.Build$BuildExecution.build(Build.java:199)
	at hudson.model.Build$BuildExecution.doRun(Build.java:160)
	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586)
	at hudson.model.Run.execute(Run.java:1597)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
	at hudson.model.ResourceController.execute(ResourceController.java:88)
	at hudson.model.Executor.run(Executor.java:247)
Caused by: hudson.remoting.ChannelClosedException: channel is already closed
	at hudson.remoting.Channel.send(Channel.java:516)
	at hudson.remoting.Request.call(Request.java:129)
	at hudson.remoting.Channel.call(Channel.java:714)
	at hudson.FilePath.act(FilePath.java:895)
	... 13 more
Caused by: java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
Caused by: java.io.EOFException
	at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2596)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1316)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
	at hudson.remoting.Command.readFrom(Command.java:92)
	at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:72)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination
of the channel
hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException:
Unexpected termination of the channel
	at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41)
	at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34)
	at hudson.remoting.Request.call(Request.java:174)
	at hudson.remoting.Channel.call(Channel.java:714)
	at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:167)
	at com.sun.proxy.$Proxy40.join(Unknown Source)
	at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:925)
	at hudson.Launcher$ProcStarter.join(Launcher.java:360)
	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:91)
	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60)
	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804)
	at hudson.model.Build$BuildExecution.build(Build.java:199)
	at hudson.model.Build$BuildExecution.doRun(Build.java:160)
	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586)
	at hudson.model.Run.execute(Run.java:1597)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
	at hudson.model.ResourceController.execute(ResourceController.java:88)
	at hudson.model.Executor.run(Executor.java:247)
Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination
of the channel
	at hudson.remoting.Request.abort(Request.java:299)
	at hudson.remoting.Channel.terminate(Channel.java:774)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69)
Caused by: java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
Caused by: java.io.EOFException
	at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2596)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1316)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
	at hudson.remoting.Command.readFrom(Command.java:92)
	at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:72)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

https://builds.apache.org/job/PreCommit-HBASE-Build/6784/console
https://builds.apache.org/job/PreCommit-HBASE-Build/6781/console

Can any one please let me how I can resolve this issue?

                
> Infinite loop possible in Master/Master replication
> ---------------------------------------------------
>
>                 Key: HBASE-7709
>                 URL: https://issues.apache.org/jira/browse/HBASE-7709
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 0.94.6, 0.95.1
>            Reporter: Lars Hofhansl
>             Fix For: 0.98.0, 0.94.12, 0.96.0
>
>         Attachments: HBASE-7709-095-trunk.patch, HBASE-7709.patch, HBASE-7709-rev1.patch,
HBASE-7709-rev2.patch
>
>
>  We just discovered the following scenario:
> # Cluster A and B are setup in master/master replication
> # By accident we had Cluster C replicate to Cluster A.
> Now all edit originating from C will be bouncing between A and B. Forever!
> The reason is that when the edit come in from C the cluster ID is already set and won't
be reset.
> We have a couple of options here:
> # Optionally only support master/master (not cycles of more than two clusters). In that
case we can always reset the cluster ID in the ReplicationSource. That means that now cycles
> 2 will have the data cycle forever. This is the only option that requires no changes
in the HLog format.
> # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that
have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already.
The is the cleanest approach, but it might need a lot of data stored per edit if there are
many clusters involved.
> # Maintain a configurable counter of the maximum cycle side we want to support. Could
default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource
increases that hop-count on each hop. If we're over the max, just drop the edit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message