hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Lv <lvzheng19800...@gmail.com>
Subject Re: is there any problem with our environment?
Date Fri, 20 Nov 2009 08:28:42 GMT
Hello Stack,
Remember the "no route to host" exceptions last time? Now there isn't any
more, and the test program can be running for several days. Thank you.
Recently we started running our crawling program, which crawls webpages and
then insert them to hbase.
But we got so many "org.apache.hadoop.hbase.NotServingRegionException" like
that:

2009-11-20 12:36:41,898 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer:
org.apache.hadoop.hbase.NotServingRegionException: webpage,
http://bbs.city.tianya.cn/tianyacity/Content/178/1/536629.shtml,1258691377544
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2261)
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1767)
        at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:650)
        at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)

And in master we can find that corresponding log:

2009-11-20 12:36:25,259 INFO org.apache.hadoop.hbase.master.ServerManager:
Processing MSG_REPORT_SPLIT:
webpage,http:\x2F\x2Fbbs.city.tianya.cn\x2Ftianyacity\x2FContent\x2F178\x2F1\x2F536629.shtml,1258691377544:
Daughters; webpage,http:\x2F\x2Fbbs.city.tianya.cn\x2Ftianyacity\x2FContent\x2F178\x2F1\x2F536629.shtml,1258691779496,
webpage,http:\x2F\x2Fbbs.city.tianya.cn\x2Ftianyacity\x2FContent\x2F329\x2F1\x2F164370.shtml,1258691779496
from ubuntu12,60020,1258687326554;

And a few hours later, some rs shutdown.

I read the mail
http://mail-archives.apache.org/mod_mbox/hadoop-hbase-user/200907.mbox/%3C9b27a8a60907272122y1bfa6254n95948942d5ca7f88@mail.gmail.com%3E,
which was sent by my partner Angus. In the mail you told us it was a case of
"HBASE-1671", Fix Version of which is 0.20.0, but the hbase version we are
using is just 0.20.0.
Any idea?
Best Regards,
LvZheng





2009/10/13 stack <stack@duboce.net>

> Thanks for posting.  Its much easier reading the logs from there.
>
> Looking in nohup.out I see it can't find region 'webpage,http:\x2F\
> x2Fnews.163.com <http://x2fnews.163.com/>
> \x2F09\x2F080\x2F0\x2F5FOO155J0001124J.html1255072992000_751685,1255316061169'.
> It never finds it.   It looks like it was assigned successfully to
> 192.168.33.5 going by the master log.  Once you've figured out the
> hardware/networking issues, lets work at getting that region back on line.
>
> The master timed out its session against zk because of 'no route to host'.
>
> St.Ack
>
> On Mon, Oct 12, 2009 at 12:23 AM, Zheng Lv <lvzheng19800619@gmail.com
> >wrote:
>
> > Hello Stack,
> >    I have enabled DEBUG and restarted the test program. This time the
> > master shut down, and I have put the logs on skydrive.
> >
> >
> http://cid-a331bb289a14fbef.skydrive.live.com/browse.aspx/.Public?uc=2&isFromRichUpload=1
> > .
> >    "nohup.out" is our test program log, "hbase-cyd-master-ubuntu6.log" is
> > master log.
> >
> >    On the other hand, today we found that when we run "dmesg", there were
> > many logs like "[3641697.122769] r8169: eth0: link down". And I think
> this
> > might be the reason of so many "no route to host" and "Time Out". Now our
> > system manager is checking, if we have a result we will let you know.:)
> >    Thanks,
> >    LvZheng.
> >
> > 2009/10/11 stack <stack@duboce.net>
> >
> > > On Fri, Oct 9, 2009 at 3:18 AM, Zheng Lv <lvzheng19800619@gmail.com>
> > > wrote:
> > >
> > > > ...
> > > > so,
> > > >    > please remove the delay so hbase fails faster so it doesn't take
> > so
> > > > long to
> > > >    > figure the issue.
> > > >    > Are you inserting every 10ms because hbase is falling over on
> you?
> > >  If
> > > >    Yes I inserted every 10ms because I'm afraid hbase would fall
> over.
> > > Now
> > > > I have removed the delay.
> > > >
> > > >    After doing these, We have run the test program again, and one
> > region
> > > > server shut down after about 2 hours, another one 3.
> > > >    I will post the logs on these two servers in following reply
> mails.
> > > >
> > > >
> > > Thanks for doing the above.
> > >
> > > For the future, debugging, please enable DEBUG and put your logs
> > somewhere
> > > where I can pull them or put them up in pastebin.  Logs in email
> messages
> > > are hard to follow.  Thanks.
> > >
> > >
> > > >    > Ok.  So this is hbase 0.20.0?  Tell us about your hardware.
>  What
> > > kind
> > > > is
> > > >    > it?  CPU/RAM/Disks.
> > > >     Yes we are using  hbase 0.20.0. And the following is our
> hardware:
> > > >
> > > >    CPU:amd x3 710
> > > >    RAM:8g ddr2 800
> > > >    Disk:270g(raid0)
> > > >
> > > >
> > > Thats an interesting chip -- 3 cores!  The above should be fine as long
> > as
> > > you coral your mapreduce jobs running on same cluster.
> > >
> > >
> > >
> > >
> > > >    We have 7 servers with above hardware, one for master, three for
> > > > namenodes / regionservers, and the other 3 for zks.
> > > >    By the way, what kind of hardware and environment do you suggest
> we
> > > > have?
> > > >
> > >
> > >
> > > This configuration seems fine to start with.  Later we might experiment
> > > running zk on same machines as regionservers and then up number of
> > > regionservers to 6 and up the quorum members to 5.
> > >
> > > St.Ack
> > >
> > >
> > > >
> > > >    Thank you, very much.
> > > >    LvZheng.
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message