hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: HBase 0.92/Hadoop 0.22 test results
Date Tue, 08 Nov 2011 17:20:25 GMT
Roman:
> 11/11/08 00:44:31 WARN util.Sleeper: We slept 38891ms instead of
> 3000ms, this is likely due to a long garbage collecting pause and it's
> usually bad, see

3000ms is the default value for hbase.regionserver.msginterval
Obviously it is too short for the validation scenario.

Can you increase its value and perform another round of test ?

Thanks

On Mon, Nov 7, 2011 at 10:37 PM, Roman Shaposhnik <rvs@apache.org> wrote:

> Forgot to add that from a master UI perspective here's where it is
> stuck at:
>
> $ curl http://master:60010/master-status?format=json
> [{"statustimems":-1,"status":"Waiting for distributed tasks to finish.
> scheduled=5 done=0
> error=0","starttimems":1320731070095,"description":"Doing distributed
> log split in
> [hdfs://ip-10-84-202-94.ec2.internal:17020/hbase/.logs/ip-10-114-225-185.ec2.internal,60020,1320726988138-splitting]","state":"RUNNING","statetimems":-1}]
>
> Regioserver finally dies and if I restart it manually the split seems to be
> finishing up as intended.
>
> Hope this helps.
>
> Thanks,
> Roman.
>
> On Mon, Nov 7, 2011 at 10:16 PM, Roman Shaposhnik <rvs@apache.org> wrote:
> > With HBASE-4754 fix in place I can get further in my testing,
> > but it still fails :-(
> >
> > Here's how it does it this time. It loads OK, but then when it
> > needs to split here's what happens:
> >
> > 11/11/08 00:44:30 INFO handler.ServerShutdownHandler: Splitting logs
> > for ip-10-114-225-185.ec2.internal,60020,1320726988138
> > 11/11/08 00:44:30 INFO master.SplitLogManager: dead splitlog worker
> > ip-10-114-225-185.ec2.internal,60020,1320726988138
> > 11/11/08 00:44:30 INFO master.SplitLogManager: started splitting logs
> > in
> [hdfs://ip-10-84-202-94.ec2.internal:17020/hbase/.logs/ip-10-114-225-185.ec2.internal,60020,1320726988138-splitting]
> > 11/11/08 00:44:31 ERROR master.HMaster: Region server
> > ^@^@ip-10-114-225-185.ec2.internal,60020,1320726988138 reported a
> > fatal error:
> > ABORTING region server
> > ip-10-114-225-185.ec2.internal,60020,1320726988138: Unhandled
> > exception: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT
> > rejected; currently processing
> > ip-10-114-225-185.ec2.internal,60020,1320726988138 as dead server
> >        at
> org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:222)
> >        at
> org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:148)
> >        at
> org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:750)
> >        at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
> >        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >        at java.lang.reflect.Method.invoke(Method.java:597)
> >        at
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
> >        at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1306)
> >
> > That's on the master side, on the regionserver side, it looks really
> > weird. It basically hums along
> > doing the split and then at some point, there's this:
> >
> > 11/11/08 00:43:40 INFO regionserver.Store: Added
> >
> hdfs://ip-10-84-202-94.ec2.internal:17020/hbase/TestLoadAndVerify_1320729464658/8bd8387431feec2b09983693dfac950b/f1/4fc67a93e580402190b5c8a72820f665,
> > entries=82049, sequenceid=142942, memsize=18.1m, filesize=4.4m
> > 11/11/08 00:43:40 INFO regionserver.HRegion: Finished memstore flush
> > of ~18.4m for region
> >
> TestLoadAndVerify_1320729464658,<\xA1\xAF(k\xCA\x1A\xEA,1320729465485.8bd8387431feec2b09983693dfac950b.
> > in 829ms, sequenceid=142942, compaction requested=false
> > 11/11/08 00:44:31 INFO zookeeper.ClientCnxn: Unable to read additional
> > data from server sessionid 0x133817270190001, likely server has closed
> > socket, closing socket connection and attempting reconnect
> > 11/11/08 00:44:31 INFO zookeeper.ClientCnxn: Unable to read additional
> > data from server sessionid 0x133817270190004, likely server has closed
> > socket, closing socket connection and attempting reconnect
> > 11/11/08 00:44:31 WARN util.Sleeper: We slept 38891ms instead of
> > 3000ms, this is likely due to a long garbage collecting pause and it's
> > usually bad, see
> > http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A9
> > 11/11/08 00:44:31 FATAL regionserver.HRegionServer: ABORTING region
> > server ip-10-114-225-185.ec2.internal,60020,1320726988138: Unhandled
> > exception: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT
> > rejected; currently processing
> > ip-10-114-225-185.ec2.internal,60020,1320726988138 as dead server
> >        at
> org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:222)
> >        at
> org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:148)
> >        at
> org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:750)
> >        at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
> >        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >        at java.lang.reflect.Method.invoke(Method.java:597)
> >        at
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
> >        at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1306)
> >
> >
> > Thanks,
> > Roman.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message