hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ramkrishna vasudevan <ramkrishna.s.vasude...@gmail.com>
Subject Re: Trunk hangs after a stop/start of RegionServer
Date Thu, 12 Mar 2015 05:35:55 GMT
The Zookeeper up and running.  I doubt this is a new problem may be due to
some recent commits and am pretty sure this was not happening in some older
version of the trunk code. Based on the time available i will look into
this flush issue.

Similarly stopping and starting the region server does not allow the
assignment to be completed (as there is only 1 RS) and so the assignment of
the table does not happen too.

BTW thanks Jerry for your help in this.

Regards
Ram

On Thu, Mar 12, 2015 at 10:54 AM, Jerry He <jerryjch@gmail.com> wrote:

> The mater(coordinator) would clean up unfinished/aborted procedures on the
> ZK nodes.
> I wonder how the case you saw could happen. Was Zookeeper down at the same
> time?
> In the meantime, manually clean up the zk nodes (under
> /hbase/flush-table-proc/)
> would let you move forward for now.
>
> Thanks,
>
> Jerry
>
> On Wed, Mar 11, 2015 at 9:35 PM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
>
> > Yes.  Before restarting the server I tried running a flush on a table.
> Was
> > that the reason for this?
> >
> > On Thu, Mar 12, 2015 at 5:03 AM, Jerry He <jerryjch@gmail.com> wrote:
> >
> > > Hi, Ram
> > >
> > > Could you tell a little more about the context of what happened?  Were
> > you
> > > running any flush table prior to the restart of the region server?
> > >
> > > Thanks,
> > >
> > > Jerry
> > >
> > > On Wed, Mar 11, 2015 at 4:07 AM, ramkrishna vasudevan <
> > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > >
> > > > Hi All
> > > >
> > > > The latest trunk hangs after we do a stop and start of the Region
> > Server
> > > > with the following error
> > > >
> > > > org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable
> > via
> > > >
> > > >
> > >
> >
> stobdtserver3,16040,1426090566331:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable:
> > > > java.io.IOException:
> > > org.apache.zookeeper.KeeperException$NoNodeException:
> > > > KeeperErrorCode = NoNode for
> > > >
> > > >
> > >
> >
> /hbase/flush-table-proc/acquired/TestTable/stobdtserver3,16040,1426090566331
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.errorhandling.ForeignException.deserialize(ForeignException.java:171)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.abort(ZKProcedureMemberRpcs.java:329)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.watchForAbortedProcedures(ZKProcedureMemberRpcs.java:142)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.start(ZKProcedureMemberRpcs.java:352)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.procedure.flush.RegionServerFlushTableProcedureManager.start(RegionServerFlushTableProcedureManager.java:102)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.procedure.RegionServerProcedureManagerHost.start(RegionServerProcedureManagerHost.java:53)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:882)
> > > >         at java.lang.Thread.run(Thread.java:745)
> > > > Caused by:
> > > >
> org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable:
> > > > java.io.IOException:
> > > org.apache.zookeeper.KeeperException$NoNodeException:
> > > > KeeperErrorCode = NoNode for
> > > >
> > > >
> > >
> >
> /hbase/flush-table-proc/acquired/TestTable/stobdtserver3,16040,1426090566331
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.procedure.Subprocedure.cancel(Subprocedure.java:273)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.procedure.ProcedureMember.controllerConnectionFailure(ProcedureMember.java:225)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.sendMemberAcquired(ZKProcedureMemberRpcs.java:254)
> > > >         at
> > > >
> > >
> >
> org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:166)
> > > >         at
> > > >
> > org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:52)
> > > >         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> > > >         at
> > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > > >         at
> > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > > >
> > > >
> > > > Even when we try to flush we get the above error. Because of this the
> > > > system hangs and we are not able to proceed with performing
> operations
> > > > particularly after we restart the region server.
> > > >
> > > > I have a single RS and single master installation for internal
> testing.
> > > Any
> > > > hints on why this happens? It was not happening till the update that
> I
> > > had
> > > > taken 3 days back.
> > > >
> > > > Regards
> > > > Ram
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message