hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jamie Cockrill <jamie.cockr...@gmail.com>
Subject Re: HBase on same boxes as HDFS Data nodes
Date Thu, 08 Jul 2010 17:07:23 GMT
Thanks all for your help with this, everything seems much more stable
for the meantime. I have a backlog loading job to run over a great
deal of data, so I might separate out my region servers from my task
trackers for the meantime.

Thanks again,

Jamie



On 8 July 2010 17:46, Jean-Daniel Cryans <jdcryans@apache.org> wrote:
> OS cache is good, glad you figured out your memory problem.
>
> J-D
>
> On Thu, Jul 8, 2010 at 2:03 AM, Jamie Cockrill <jamie.cockrill@gmail.com> wrote:
>> Morning all. Day 2 begins...
>>
>> I discussed this with someone else earlier and they pointed out that
>> we also have task trackers running on all of those nodes, which will
>> affect the amount of memory being used when jobs are being run. Each
>> tasktracker had a maximum of 8 maps and 8 reduces configured per node,
>> with a JVM Xmx of 512mb each.  Clearly this implies a fully utilised
>> node will use 8*512mb + 8*512mb = 8GB of memory on tasks alone. That's
>> before the datanode does anything, or HBase for that matter.
>>
>> As such, I've dropped it to 4 maps, 4 reduces per node and reduced the
>> Xmx to 256mb, giving a potential maximum task overhead of 2GB per
>> node. Running 'vmstat 20' now, under load from mapreduce jobs,
>> suggests that the actual free memory is about the same, but the memory
>> cache is much much bigger, which presumably is healthlier as, in
>> theory, that ought to relinquish memory to processes that request it.
>>
>> Lets see if that does the trick!
>>
>> ta
>>
>> Jamie
>>
>>
>> On 7 July 2010 19:30, Jean-Daniel Cryans <jdcryans@apache.org> wrote:
>>> YouAreDead means that the region server's session was expired, GC
>>> seems like your major problem. (file problems can happen after a GC
>>> sleep because they were moved around while the process was sleeping,
>>> you also get the same kind of messages with xcievers issue... sorry
>>> for the confusion)
>>>
>>> By over committing the memory I meant trying to fit too much stuff in
>>> the amount of RAM that you have. I guess it's the map and reduce tasks
>>> that eat all the free space? Why not lower their number?
>>>
>>> J-D
>>>
>>> On Wed, Jul 7, 2010 at 11:22 AM, Jamie Cockrill
>>> <jamie.cockrill@gmail.com> wrote:
>>>> PS, I've now reset my MAX_FILESIZE back to the default.  (from the 1GB
>>>> i raised it to). It caused me to run into a delightful
>>>> 'YouAreDeadException' which looks very related to the Garbage
>>>> collection issues on the Troubleshooting page, as my Zookeeper session
>>>> expired.
>>>>
>>>> Thanks
>>>>
>>>> Jamie
>>>>
>>>>
>>>>
>>>> On 7 July 2010 19:19, Jamie Cockrill <jamie.cockrill@gmail.com> wrote:
>>>>> By overcommit, do you mean make my overcommit_ratio higher on each box
>>>>> (its at the default 50 at the moment)? What I'm noticing at the moment
>>>>> is that hadoop is taking up the vast majority of the memory on the
>>>>> boxes.
>>>>>
>>>>> I found this article:
>>>>> http://blog.rapleaf.com/dev/2010/01/05/the-wrath-of-drwho-or-unpredictable-hadoop-memory-usage/
>>>>> which Todd, it looks like you replied to. Does this sound like a
>>>>> similar problem? No worries if you can't remember, it was back in
>>>>> january! This article suggests reducing the amount of memory allocated
>>>>> to Hadoop at startup, how would I go about doing this?
>>>>>
>>>>> Thank you everyone for your patience so far. Sorry if this is taking
>>>>> up a lot of your time.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Jamie
>>>>>
>>>>> On 7 July 2010 19:03, Jean-Daniel Cryans <jdcryans@apache.org>
wrote:
>>>>>> swappinness at 0 is good, but also don't overcommit your memory!
>>>>>>
>>>>>> J-D
>>>>>>
>>>>>> On Wed, Jul 7, 2010 at 10:53 AM, Jamie Cockrill
>>>>>> <jamie.cockrill@gmail.com> wrote:
>>>>>>> I think you're right.
>>>>>>>
>>>>>>> Unfortunately the machines are on a separate network to this
laptop,
>>>>>>> so I'm having to type everything across, apologies if it doesn't
>>>>>>> translate well...
>>>>>>>
>>>>>>> free -m gave:
>>>>>>>
>>>>>>> Mem    Total    Used     Free
>>>>>>>            7992     7939      53
>>>>>>> b/c                    7877    114
>>>>>>> Swap: 23415       895  22519
>>>>>>>
>>>>>>> I did this on another node that isn't being smashed at the moment
and
>>>>>>> the numbers came out similar, but the buffers/cache free was
higher
>>>>>>>
>>>>>>> vmstat -20 is giving non-zero si and so's ranging between 3 and
just
>>>>>>> short of 5000.
>>>>>>>
>>>>>>> That seems to be it I guess. Hadoop troubleshooting suggests
setting
>>>>>>> swappiness to 0, is that just a case of changing the value in
>>>>>>> /proc/sys/vm/swappiness?
>>>>>>>
>>>>>>> thanks
>>>>>>>
>>>>>>> Jamie
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 7 July 2010 18:40, Todd Lipcon <todd@cloudera.com> wrote:
>>>>>>>> On Wed, Jul 7, 2010 at 10:32 AM, Jamie Cockrill <jamie.cockrill@gmail.com>wrote:
>>>>>>>>
>>>>>>>>> On the subject of GC and heap, I've left those as defaults.
I could
>>>>>>>>> look at those if that's the next logical step? Would
there be anything
>>>>>>>>> in any of the logs that I should look at?
>>>>>>>>>
>>>>>>>>> One thing I have noticed is that it does take an absolute
age to log
>>>>>>>>> in to the DN/RS to restart the RS once it's fallen over,
in one
>>>>>>>>> instance it took about 10 minutes. These are 8GB, 4 core
amd64 boxes
>>>>>>>>>
>>>>>>>>>
>>>>>>>> That indicates swapping. Can you run "free -m" on the node?
>>>>>>>>
>>>>>>>> Also let "vmstat 20" run while running your job and observe
the "si" and
>>>>>>>> "so" columns. If those are nonzero, it indicates you're swapping,
and you've
>>>>>>>> oversubscribed your RAM (very easy on 8G machines)
>>>>>>>>
>>>>>>>> -Todd
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> ta
>>>>>>>>>
>>>>>>>>> Jamie
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 7 July 2010 18:30, Jamie Cockrill <jamie.cockrill@gmail.com>
wrote:
>>>>>>>>> > Bad news, it looks like my xcievers is set as it
should be, it's in
>>>>>>>>> > the hdfs-site.xml and looking at the job.xml of
one of my jobs in the
>>>>>>>>> > job-tracker, it's showing that property as set to
2047. I've cat |
>>>>>>>>> > grepped one of the datanode logs and although there
were a few in
>>>>>>>>> > there, they were from a few months ago. I've upped
my MAX_FILESIZE on
>>>>>>>>> > my table to 1GB to see if that helps (not sure if
it will!).
>>>>>>>>> >
>>>>>>>>> > Thanks,
>>>>>>>>> >
>>>>>>>>> > Jamie
>>>>>>>>> >
>>>>>>>>> > On 7 July 2010 18:12, Jean-Daniel Cryans <jdcryans@apache.org>
wrote:
>>>>>>>>> >> xcievers exceptions will be in the datanodes'
logs, and your problem
>>>>>>>>> >> totally looks like it. 0.20.5 will have the
same issue (since it's on
>>>>>>>>> >> the HDFS side)
>>>>>>>>> >>
>>>>>>>>> >> J-D
>>>>>>>>> >>
>>>>>>>>> >> On Wed, Jul 7, 2010 at 10:08 AM, Jamie Cockrill
>>>>>>>>> >> <jamie.cockrill@gmail.com> wrote:
>>>>>>>>> >>> Hi Todd & JD,
>>>>>>>>> >>>
>>>>>>>>> >>> Environment:
>>>>>>>>> >>> All (hadoop and HBase) installed as of karmic-cdh3,
which means:
>>>>>>>>> >>> Hadoop 0.20.2+228
>>>>>>>>> >>> HBase 0.89.20100621+17
>>>>>>>>> >>> Zookeeper 3.3.1+7
>>>>>>>>> >>>
>>>>>>>>> >>> Unfortunately my whole cluster of regionservers
have now crashed, so I
>>>>>>>>> >>> can't really say if it was swapping too
much. There is a DEBUG
>>>>>>>>> >>> statement just before it crashes saying:
>>>>>>>>> >>>
>>>>>>>>> >>> org.apache.hadoop.hbase.regionserver.wal.HLog:
closing hlog writer in
>>>>>>>>> >>> hdfs://<somewhere on my HDFS, in /hbase>
>>>>>>>>> >>>
>>>>>>>>> >>> What follows is:
>>>>>>>>> >>>
>>>>>>>>> >>> WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
Exception:
>>>>>>>>> >>> org.apache.hadoop.ipc.RemoteException:
>>>>>>>>> >>> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
No lease
>>>>>>>>> >>> on <file location as above> File does
not exist. Holder
>>>>>>>>> >>> DFSClient_-11113603 does not have any open
files
>>>>>>>>> >>>
>>>>>>>>> >>> It then seems to try and do some error recovery
(Error Recovery for
>>>>>>>>> >>> block null bad datanode[0] nodes == null),
fails (Could not get block
>>>>>>>>> >>> locations. Source file "<hbase file as
before>" - Aborting). There is
>>>>>>>>> >>> then an ERROR org.apache...HRegionServer:
Close and delete failed.
>>>>>>>>> >>> There is then a similar LeaseExpiredException
as above.
>>>>>>>>> >>>
>>>>>>>>> >>> There are then a couple of messages from
HRegionServer saying that
>>>>>>>>> >>> it's notifying master of its shutdown and
stopping itself. The
>>>>>>>>> >>> shutdown hook then fires and the RemoteException
and
>>>>>>>>> >>> LeaseExpiredExceptions are printed again.
>>>>>>>>> >>>
>>>>>>>>> >>> ulimit is set to 65000 (it's in the regionserver
log, printed as I
>>>>>>>>> >>> restarted the regionserver), however I haven't
got the xceivers set
>>>>>>>>> >>> anywhere. I'll give that a go. It does seem
very odd as I did have a
>>>>>>>>> >>> few of them fall over one at a time with
a few early loads, but that
>>>>>>>>> >>> seemed to be because the regions weren't
splitting properly, so all
>>>>>>>>> >>> the traffic was going to one node and it
was being overwhelmed. Once I
>>>>>>>>> >>> throttled it, after one load it a region
split seemed to get
>>>>>>>>> >>> triggered, which flung regions all over,
which made subsequent loads
>>>>>>>>> >>> much more distributed. However, perhaps
the time-bomb was ticking...
>>>>>>>>> >>> I'll  have a go at specifying the xcievers
property. I'm pretty
>>>>>>>>> >>> certain i've got everything else covered,
except the patches as
>>>>>>>>> >>> referenced in the JIRA.
>>>>>>>>> >>>
>>>>>>>>> >>> I just grepped some of the log files and
didn't get an explicit
>>>>>>>>> >>> exception with 'xciever' in it.
>>>>>>>>> >>>
>>>>>>>>> >>> I am considering downgrading(?) to 0.20.5,
however because everything
>>>>>>>>> >>> is installed as per karmic-cdh3, I'm a bit
reluctant to do so as
>>>>>>>>> >>> presumably Cloudera has tested each of these
versions against each
>>>>>>>>> >>> other? And I don't really want to introduce
further versioning issues.
>>>>>>>>> >>>
>>>>>>>>> >>> Thanks,
>>>>>>>>> >>>
>>>>>>>>> >>> Jamie
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>> On 7 July 2010 17:30, Jean-Daniel Cryans
<jdcryans@apache.org> wrote:
>>>>>>>>> >>>> Jamie,
>>>>>>>>> >>>>
>>>>>>>>> >>>> Does your configuration meets the requirements?
>>>>>>>>> >>>>
>>>>>>>>> http://hbase.apache.org/docs/r0.20.5/api/overview-summary.html#requirements
>>>>>>>>> >>>>
>>>>>>>>> >>>> ulimit and xcievers, if not set, are
usually time bombs that blow off
>>>>>>>>> when
>>>>>>>>> >>>> the cluster is under load.
>>>>>>>>> >>>>
>>>>>>>>> >>>> J-D
>>>>>>>>> >>>>
>>>>>>>>> >>>> On Wed, Jul 7, 2010 at 9:11 AM, Jamie
Cockrill <
>>>>>>>>> jamie.cockrill@gmail.com>wrote:
>>>>>>>>> >>>>
>>>>>>>>> >>>>> Dear all,
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> My current HBase/Hadoop architecture
has HBase region servers on the
>>>>>>>>> >>>>> same physical boxes as the HDFS
data-nodes. I'm getting an awful lot
>>>>>>>>> >>>>> of region server crashes. The last
thing that happens appears to be a
>>>>>>>>> >>>>> DroppedSnapshot Exception, caused
by an IOException: could not
>>>>>>>>> >>>>> complete write to file <file
on HDFS>. I am running it under load,
>>>>>>>>> how
>>>>>>>>> >>>>> heavy that is I'm not sure how that
is quantified, but I'm guessing
>>>>>>>>> it
>>>>>>>>> >>>>> is a load issue.
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> Is it common practice to put region
servers on data-nodes? Is it
>>>>>>>>> >>>>> common to see region server crashes
when either the HDFS or region
>>>>>>>>> >>>>> server (or both) is under heavy
load? I'm guessing that is the case
>>>>>>>>> as
>>>>>>>>> >>>>> I've seen a few similar posts. I've
not got a great deal of capacity
>>>>>>>>> >>>>> to be separating region servers
from HDFS data nodes, but it might be
>>>>>>>>> >>>>> an argument I could make.
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> Thanks
>>>>>>>>> >>>>>
>>>>>>>>> >>>>> Jamie
>>>>>>>>> >>>>>
>>>>>>>>> >>>>
>>>>>>>>> >>>
>>>>>>>>> >>
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Todd Lipcon
>>>>>>>> Software Engineer, Cloudera
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message