hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: HTable.coprocessorExec call times out
Date Fri, 20 Jul 2012 13:35:33 GMT
Can you check the following config param to see if its value is high enough
?

    <name>hbase.zookeeper.property.maxClientCnxns</name>

Cheers

On Fri, Jul 20, 2012 at 6:23 AM, Kevin <kevin.macksamie@gmail.com> wrote:

> In zookeeper I see that regionserver connections are timing out. I open an
> HTable, call coprocessorExec, then I close the HTable. This is done in the
> for-loop. I'm not sure why the regionservers are timing out. I think don't
> think anymore it's a client-side issue but maybe a server-side issue with
> the regionservers. I don't know what else to do to debug it further.
>
> 2012-07-19 07:01:26,738 [myid:] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1000] - Closed socket connection for
> client /10.10.23.124:46863 which had sessionid 0x138763251670a3d
> 2012-07-19 07:01:56,000 [myid:] - INFO  [SessionTracker:ZooKeeperServer@325
> ]
> - Expiring session 0x138763251670a37, timeout of 40000ms exceeded
> 2012-07-19 07:01:56,000 [myid:] - INFO  [ProcessThread(sid:0
> cport:-1)::PrepRequestProcessor@476] - Processed session termination for
> sessionid: 0x138763251670a37
> 2012-07-19 07:02:00,000 [myid:] - INFO  [SessionTracker:ZooKeeperServer@325
> ]
> - Expiring session 0x138763251670a3d, timeout of 40000ms exceeded
> 2012-07-19 07:02:00,000 [myid:] - INFO  [SessionTracker:ZooKeeperServer@325
> ]
> - Expiring session 0x138763251670a39, timeout of 40000ms exceeded
> 2012-07-19 07:02:00,001 [myid:] - INFO  [SessionTracker:ZooKeeperServer@325
> ]
> - Expiring session 0x138763251670a3f, timeout of 40000ms exceeded
> 2012-07-19 07:02:00,001 [myid:] - INFO  [ProcessThread(sid:0
> cport:-1)::PrepRequestProcessor@476] - Processed session termination for
> sessionid: 0x138763251670a3d
> 2012-07-19 07:02:00,001 [myid:] - INFO  [ProcessThread(sid:0
> cport:-1)::PrepRequestProcessor@476] - Processed session termination for
> sessionid: 0x138763251670a39
> 2012-07-19 07:02:00,001 [myid:] - INFO  [ProcessThread(sid:0
> cport:-1)::PrepRequestProcessor@476] - Processed session termination for
> sessionid: 0x138763251670a3f
> 2012-07-19 07:02:04,000 [myid:] - INFO  [SessionTracker:ZooKeeperServer@325
> ]
> - Expiring session 0x138763251670a38, timeout of 40000ms exceeded
> 2012-07-19 07:02:04,000 [myid:] - INFO  [SessionTracker:ZooKeeperServer@325
> ]
> - Expiring session 0x138763251670a36, timeout of 40000ms exceeded
> 2012-07-19 07:02:04,001 [myid:] - INFO  [SessionTracker:ZooKeeperServer@325
> ]
> - Expiring session 0x138763251670a3c, timeout of 40000ms exceeded
> 2012-07-19 07:02:04,001 [myid:] - INFO  [SessionTracker:ZooKeeperServer@325
> ]
> - Expiring session 0x138763251670a3e, timeout of 40000ms exceeded
> 2012-07-19 07:02:04,001 [myid:] - INFO  [ProcessThread(sid:0
> cport:-1)::PrepRequestProcessor@476] - Processed session termination for
> sessionid: 0x138763251670a38
> 2012-07-19 07:02:04,001 [myid:] - INFO  [ProcessThread(sid:0
> cport:-1)::PrepRequestProcessor@476] - Processed session termination for
> sessionid: 0x138763251670a36
> 2012-07-19 07:02:04,002 [myid:] - INFO  [ProcessThread(sid:0
> cport:-1)::PrepRequestProcessor@476] - Processed session termination for
> sessionid: 0x138763251670a3c
> 2012-07-19 07:02:04,002 [myid:] - INFO  [ProcessThread(sid:0
> cport:-1)::PrepRequestProcessor@476] - Processed session termination for
> sessionid: 0x138763251670a3e
>
>
> On Thu, Jul 19, 2012 at 4:55 PM, Kevin <kevin.macksamie@gmail.com> wrote:
>
> > That's it. That's the end of the regionserver log. In the master's web UI
> > the regionserver is in the table labeled "Dead Region Servers."
> >
> > In the master's log there is:
> > 2012-07-19 07:02:04,016 INFO
> > org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer
> > ephemeral node deleted, processing expiration
> [slave2,60020,1342694622535]
> > 2012-07-19 07:02:04,025 DEBUG
> > org.apache.hadoop.hbase.master.AssignmentManager: based on AM, current
> > region=-ROOT-,,0.70236052 is on server=slave2,60020,1342694622535 server
> > being checked: slave2,60020,1342694622535
> > 2012-07-19 07:02:04,025 DEBUG
> > org.apache.hadoop.hbase.master.AssignmentManager: based on AM, current
> > region=.META.,,1.1028785192 is on server=slave2,60020,1342694622535
> server
> > being checked: slave2,60020,1342694622535
> > 2012-07-19 07:02:04,027 DEBUG
> > org.apache.hadoop.hbase.master.ServerManager:
> > Added=slave2,60020,1342694622535 to dead servers, submitted shutdown
> > handler to be executed, root=true, meta=true
> > 2012-07-19 07:02:04,027 INFO
> > org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting
> > logs for slave2,60020,1342694622535
> >
> > The timestamps are different than above but it's from the same
> application
> > just at a different time than I sent before. The message would look the
> > same only with the timestamps being 2012-07-19 09:49ish
> >
> >
> > On Thu, Jul 19, 2012 at 4:50 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> >> Can you paste more of the region server log after 09:49:18,551 (till the
> >> region server died) ?
> >>
> >> Thanks
> >>
> >> On Thu, Jul 19, 2012 at 1:46 PM, Kevin <kevin.macksamie@gmail.com>
> wrote:
> >>
> >> > The log snippet just before the regionservers die look like this:
> >> >
> >> > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: putting new
> >> > rowkey
> >> > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: new rowkey
> >> put
> >> > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: coproc
> time:
> >> > 1227 ms
> >> > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: closing
> >> scanner
> >> > 2012-07-19 09:49:18,551 INFO project.coproc.IndexEndpoint: scanner
> >> closed
> >> > <after this log statement in the endpoint code is the return
> statement>
> >> >
> >> > A coprocessorExec call may be from 3-20 seconds after the previous (it
> >> > depends how long the last call took). But I know the endpoints are
> >> > finishing their code fast because throughout the log each "coproc
> time:"
> >> > statement is under 5 seconds.
> >> >
> >> > I am using CDH4b2, which uses HBase 0.92.1.
> >> >
> >> > On Thu, Jul 19, 2012 at 4:35 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> >> >
> >> > > Kevin:
> >> > > Can you pastebin the log snippet from region server just before it
> >> died ?
> >> > >
> >> > > How frequent were your coprocessorExec() calls ?
> >> > > What HBase version were you using ?
> >> > >
> >> > > Thanks
> >> > >
> >> > > On Thu, Jul 19, 2012 at 12:44 PM, Kevin <kevin.macksamie@gmail.com>
> >> > wrote:
> >> > >
> >> > > > Hi,
> >> > > >
> >> > > > I'm using endpoint coprocessors to do intense scans in parallel
on
> >> some
> >> > > > tables. I log the time it takes for each coprocessor to finish
its
> >> job
> >> > on
> >> > > > the region. Each coprocessor rarely takes longer than a few
> seconds,
> >> > > > maximum of 5 seconds (there are only 5 regions on the tables
right
> >> > now).
> >> > > As
> >> > > > my cluster grows with data the call HTable.coprocessorExec takes
> >> longer
> >> > > and
> >> > > > longer but the coprocessors themselves finish quickly (under
5
> >> > seconds).
> >> > > > Eventually I see all my regionservers die because the
> >> coprocessorExec
> >> > > call
> >> > > > timed out and zookeeper kills the connection, which makes the
> >> > > regionserver
> >> > > > die.
> >> > > >
> >> > > > In terms of code structure, the coprocessorExec call is done
> inside
> >> a
> >> > > > for-loop. The for-loop iterates over a List of objects to help
> form
> >> > > filters
> >> > > > for the endpoint and then calls the coprocessorExec once per
> object
> >> > > > processed.
> >> > > >
> >> > > > What would be the bottleneck? Is calling the coprocessor like
this
> >> in a
> >> > > > for-loop loading the regions down and not allowing them time
to do
> >> GC?
> >> > Is
> >> > > > there a way to ping a table and judge if it'll be ready for the
> >> > endpoint
> >> > > > call?
> >> > > >
> >> > > > Thanks,
> >> > > > -Kevin
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message