hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: is there any problem with our environment?
Date Fri, 20 Nov 2009 18:22:03 GMT
On Fri, Nov 20, 2009 at 12:28 AM, Zheng Lv <lvzheng19800619@gmail.com>wrote:

> Hello Stack,
> Remember the "no route to host" exceptions last time? Now there isn't any
> more, and the test program can be running for several days.


How did you fix it?



> Thank you.
> Recently we started running our crawling program, which crawls webpages and
> then insert them to hbase.
> But we got so many "org.apache.hadoop.hbase.NotServingRegionException" like
> that:
>
> 2009-11-20 12:36:41,898 ERROR
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> org.apache.hadoop.hbase.NotServingRegionException: webpage,
>
> http://bbs.city.tianya.cn/tianyacity/Content/178/1/536629.shtml,1258691377544
>

So figure out whats happening to that region by grepping its name in the
master log.  Why is it offline so long?  Are machines loaded?  Swapping?

Are the crawlers running on same machines as hbase?

What crawler are you using?

Andrew Purtell has written up some notes on getting a nice balance between
crawl process and hbase such that all runs smoothly in private
correspondence.  Let me ask him if its ok to forward the list.


....

2009-11-20 12:36:25,259 INFO org.apache.hadoop.hbase.master.ServerManager:
> Processing MSG_REPORT_SPLIT:
> webpage,http:\x2F\x2Fbbs.city.tianya.cn
> \x2Ftianyacity\x2FContent\x2F178\x2F1\x2F536629.shtml,1258691377544:
> Daughters; webpage,http:\x2F\x2Fbbs.city.tianya.cn
> \x2Ftianyacity\x2FContent\x2F178\x2F1\x2F536629.shtml,1258691779496,
> webpage,http:\x2F\x2Fbbs.city.tianya.cn
> \x2Ftianyacity\x2FContent\x2F329\x2F1\x2F164370.shtml,1258691779496
> from ubuntu12,60020,1258687326554;
>
> Yeah, its split.  Thats normal.  Whats not normal is the client not finding
the daughter split in its new location.  Did the daughters get deployed
promptly?



> And a few hours later, some rs shutdown.
>
> I read the mail
>
> http://mail-archives.apache.org/mod_mbox/hadoop-hbase-user/200907.mbox/%3C9b27a8a60907272122y1bfa6254n95948942d5ca7f88@mail.gmail.com%3E
> ,
> which was sent by my partner Angus. In the mail you told us it was a case
> of
> "HBASE-1671", Fix Version of which is 0.20.0, but the hbase version we are
> using is just 0.20.0.
>

Can you update to hbase 0.20.2?  It has a bunch of fixes that could be
related to the above.
Yours,
St.Ack



> Any idea?
> Best Regards,
> LvZheng
>
>
>
>
>
> 2009/10/13 stack <stack@duboce.net>
>
> > Thanks for posting.  Its much easier reading the logs from there.
> >
> > Looking in nohup.out I see it can't find region 'webpage,http:\x2F\
> > x2Fnews.163.com <http://x2fnews.163.com/>
> >
> \x2F09\x2F080\x2F0\x2F5FOO155J0001124J.html1255072992000_751685,1255316061169'.
> > It never finds it.   It looks like it was assigned successfully to
> > 192.168.33.5 going by the master log.  Once you've figured out the
> > hardware/networking issues, lets work at getting that region back on
> line.
> >
> > The master timed out its session against zk because of 'no route to
> host'.
> >
> > St.Ack
> >
> > On Mon, Oct 12, 2009 at 12:23 AM, Zheng Lv <lvzheng19800619@gmail.com
> > >wrote:
> >
> > > Hello Stack,
> > >    I have enabled DEBUG and restarted the test program. This time the
> > > master shut down, and I have put the logs on skydrive.
> > >
> > >
> >
> http://cid-a331bb289a14fbef.skydrive.live.com/browse.aspx/.Public?uc=2&isFromRichUpload=1
> > > .
> > >    "nohup.out" is our test program log, "hbase-cyd-master-ubuntu6.log"
> is
> > > master log.
> > >
> > >    On the other hand, today we found that when we run "dmesg", there
> were
> > > many logs like "[3641697.122769] r8169: eth0: link down". And I think
> > this
> > > might be the reason of so many "no route to host" and "Time Out". Now
> our
> > > system manager is checking, if we have a result we will let you know.:)
> > >    Thanks,
> > >    LvZheng.
> > >
> > > 2009/10/11 stack <stack@duboce.net>
> > >
> > > > On Fri, Oct 9, 2009 at 3:18 AM, Zheng Lv <lvzheng19800619@gmail.com>
> > > > wrote:
> > > >
> > > > > ...
> > > > > so,
> > > > >    > please remove the delay so hbase fails faster so it doesn't
> take
> > > so
> > > > > long to
> > > > >    > figure the issue.
> > > > >    > Are you inserting every 10ms because hbase is falling over
on
> > you?
> > > >  If
> > > > >    Yes I inserted every 10ms because I'm afraid hbase would fall
> > over.
> > > > Now
> > > > > I have removed the delay.
> > > > >
> > > > >    After doing these, We have run the test program again, and one
> > > region
> > > > > server shut down after about 2 hours, another one 3.
> > > > >    I will post the logs on these two servers in following reply
> > mails.
> > > > >
> > > > >
> > > > Thanks for doing the above.
> > > >
> > > > For the future, debugging, please enable DEBUG and put your logs
> > > somewhere
> > > > where I can pull them or put them up in pastebin.  Logs in email
> > messages
> > > > are hard to follow.  Thanks.
> > > >
> > > >
> > > > >    > Ok.  So this is hbase 0.20.0?  Tell us about your hardware.
> >  What
> > > > kind
> > > > > is
> > > > >    > it?  CPU/RAM/Disks.
> > > > >     Yes we are using  hbase 0.20.0. And the following is our
> > hardware:
> > > > >
> > > > >    CPU:amd x3 710
> > > > >    RAM:8g ddr2 800
> > > > >    Disk:270g(raid0)
> > > > >
> > > > >
> > > > Thats an interesting chip -- 3 cores!  The above should be fine as
> long
> > > as
> > > > you coral your mapreduce jobs running on same cluster.
> > > >
> > > >
> > > >
> > > >
> > > > >    We have 7 servers with above hardware, one for master, three for
> > > > > namenodes / regionservers, and the other 3 for zks.
> > > > >    By the way, what kind of hardware and environment do you suggest
> > we
> > > > > have?
> > > > >
> > > >
> > > >
> > > > This configuration seems fine to start with.  Later we might
> experiment
> > > > running zk on same machines as regionservers and then up number of
> > > > regionservers to 6 and up the quorum members to 5.
> > > >
> > > > St.Ack
> > > >
> > > >
> > > > >
> > > > >    Thank you, very much.
> > > > >    LvZheng.
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message