hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans" <jdcry...@apache.org>
Subject Re: Regionserver fails to serve region
Date Tue, 28 Oct 2008 20:13:55 GMT
Slava,

http://wiki.apache.org/hadoop/Hbase/FAQ#5

J-D

On Tue, Oct 28, 2008 at 3:31 PM, Slava Gorelik <slava.gorelik@gmail.com>wrote:

> Hi.First of all i want to say thank you for you assistance !!!
>
> DEBUG on hadoop or hbase ? And how can i enable ?
> fsck said that HDFS is healthy.
>
> Best Regards and Thank You
>
>
> On Tue, Oct 28, 2008 at 8:45 PM, stack <stack@duboce.net> wrote:
>
> > Slava Gorelik wrote:
> >
> >> Hi.HDFS capacity is about 800gb (8 datanodes) and the current usage is
> >> about
> >> 30GB. This is after total re-format of the HDFS that was made a hour
> >> before.
> >>
> >> BTW, the logs i sent are from the first exception that i found in them.
> >> Best Regards.
> >>
> >>
> > Please enable DEBUG and retry.  Send me all logs.  What does the fsck on
> > HDFS say?  There is something seriously wrong with your cluster that you
> are
> > having so much trouble getting it running.  Lets try and figure it.
> >
> > St.Ack
> >
> >
> >
> >
> >
> >> On Tue, Oct 28, 2008 at 7:12 PM, stack <stack@duboce.net> wrote:
> >>
> >>
> >>
> >>> I took a quick look Slava (Thanks for sending the files).   Here's a
> few
> >>> notes:
> >>>
> >>> + The logs are from after the damage is done; the transition from good
> to
> >>> bad is missing.  If I could see that, that would help
> >>> + But what seems to be plain is that that your HDFS is very sick.  See
> >>> this
> >>> from head of one of the regionserver logs:
> >>>
> >>> 2008-10-27 23:41:12,682 WARN org.apache.hadoop.dfs.DFSClient:
> >>> DataStreamer
> >>> Exception: java.io.IOException: Unable to create new block.
> >>>  at
> >>>
> >>>
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2349)
> >>>  at
> >>>
> >>>
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1735)
> >>>  at
> >>>
> >>>
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1912)
> >>>
> >>> 2008-10-27 23:41:12,682 WARN org.apache.hadoop.dfs.DFSClient: Error
> >>> Recovery for block blk_-5188192041705782716_60000 bad datanode[0]
> >>> 2008-10-27 23:41:12,685 ERROR
> >>> org.apache.hadoop.hbase.regionserver.CompactSplitThread:
> Compaction/Split
> >>> failed for region
> >>> BizDB,1.1.PerfBO1.f2188a42-5eb7-4a6a-82ef-2da0d0ea4ce0,1225136351518
> >>> java.io.IOException: Could not get block locations. Aborting...
> >>>
> >>>
> >>> If HDFS is ailing, hbase is too.  In fact, the regionservers will shut
> >>> themselves to protect themselves against damaging or losing data:
> >>>
> >>> 2008-10-27 23:41:12,688 FATAL
> >>> org.apache.hadoop.hbase.regionserver.Flusher:
> >>> Replay of hlog required. Forcing server restart
> >>>
> >>> So, whats up with your HDFS?  Not enough space alloted?  What happens
> if
> >>> you run "./bin/hadoop fsck /"?  Does that give you a clue as to what
> >>> happened?  Dig in the datanode and namenode logs.  Look for where the
> >>> exceptions start.  It might give you a clue.
> >>>
> >>> + The suse regionserver log had garbage in it.
> >>>
> >>> St.Ack
> >>>
> >>>
> >>> Slava Gorelik wrote:
> >>>
> >>>
> >>>
> >>>> Hi.
> >>>> My happiness was very short :-( After i successfully added 1M rows
> (50k
> >>>> each row) i tried to add 10M rows.
> >>>> And after 3-4 working hours it started to dying. First one region
> server
> >>>> is died, after another one and eventually all cluster is dead.
> >>>>
> >>>> I attached log files (relevant part, archived) from region servers and
> >>>> from the master.
> >>>>
> >>>> Best Regards.
> >>>>
> >>>>
> >>>>
> >>>> On Mon, Oct 27, 2008 at 11:19 AM, Slava Gorelik <
> >>>> slava.gorelik@gmail.com<mailto:
> >>>> slava.gorelik@gmail.com>> wrote:
> >>>>
> >>>>   Hi.
> >>>>   So far so good, after changing the file descriptors
> >>>>   and dfs.datanode.socket.write.timeout, dfs.datanode.max.xcievers
> >>>>   my cluster works stable.
> >>>>   Thank You and Best Regards.
> >>>>
> >>>>   P.S. Regarding deleting multiple columns missing functionality i
> >>>>   filled jira : https://issues.apache.org/jira/browse/HBASE-961
> >>>>
> >>>>
> >>>>
> >>>>   On Sun, Oct 26, 2008 at 12:58 AM, Michael Stack <stack@duboce.net
> >>>>   <mailto:stack@duboce.net>> wrote:
> >>>>
> >>>>       Slava Gorelik wrote:
> >>>>
> >>>>           Hi.Haven't tried yet them, i'll try tomorrow morning. In
> >>>>           general cluster is
> >>>>           working well, the problems begins if i'm trying to add 10M
> >>>>           rows, after 1.2M
> >>>>           if happened.
> >>>>
> >>>>       Anything else running beside the regionserver or datanodes
> >>>>       that would suck resources?  When datanodes begin to slow, we
> >>>>       begin to see the issue Jean-Adrien's configurations address.
> >>>>        Are you uploading using MapReduce?  Are TTs running on same
> >>>>       nodes as the datanode and regionserver?  How are you doing the
> >>>>       upload?  Describe what your uploader looks like (Sorry if
> >>>>       you've already done this).
> >>>>
> >>>>
> >>>>            I already changed the limit of files descriptors,
> >>>>
> >>>>       Good.
> >>>>
> >>>>
> >>>>            I'll try
> >>>>           to change the properties:
> >>>>            <property> <name>dfs.datanode.socket.write.timeout</name>
> >>>>            <value>0</value>
> >>>>           </property>
> >>>>
> >>>>           <property>
> >>>>            <name>dfs.datanode.max.xcievers</name>
> >>>>            <value>1023</value>
> >>>>           </property>
> >>>>
> >>>>
> >>>>       Yeah, try it.
> >>>>
> >>>>
> >>>>           And let you know, is any other prescriptions ? Did i miss
> >>>>           something ?
> >>>>
> >>>>           BTW, off topic, but i sent e-mail recently to the list and
> >>>>           i can't see it:
> >>>>           Is it possible to delete multiple columns in any way by
> >>>>           regex : for example
> >>>>           colum_name_* ?
> >>>>
> >>>>       Not that I know of.  If its not in the API, it should be.
> >>>>        Mind filing a JIRA?
> >>>>
> >>>>       Thanks Slava.
> >>>>       St.Ack
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message