hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: HBase on same boxes as HDFS Data nodes
Date Thu, 08 Jul 2010 17:11:33 GMT
More info on this blog post:
http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html

J-D

On Thu, Jul 8, 2010 at 10:11 AM, Jean-Daniel Cryans <jdcryans@apache.org> wrote:
> This would be done at the expense of network IO, since you will lose
> locality for jobs that read/write to HBase. Also I guess the datanodes
> are also there, so HBase will lose locality with HDFS.
>
> J-D
>
> On Thu, Jul 8, 2010 at 10:07 AM, Jamie Cockrill
> <jamie.cockrill@gmail.com> wrote:
>> Thanks all for your help with this, everything seems much more stable
>> for the meantime. I have a backlog loading job to run over a great
>> deal of data, so I might separate out my region servers from my task
>> trackers for the meantime.
>>
>> Thanks again,
>>
>> Jamie
>>
>>
>>
>> On 8 July 2010 17:46, Jean-Daniel Cryans <jdcryans@apache.org> wrote:
>>> OS cache is good, glad you figured out your memory problem.
>>>
>>> J-D
>>>
>>> On Thu, Jul 8, 2010 at 2:03 AM, Jamie Cockrill <jamie.cockrill@gmail.com>
wrote:
>>>> Morning all. Day 2 begins...
>>>>
>>>> I discussed this with someone else earlier and they pointed out that
>>>> we also have task trackers running on all of those nodes, which will
>>>> affect the amount of memory being used when jobs are being run. Each
>>>> tasktracker had a maximum of 8 maps and 8 reduces configured per node,
>>>> with a JVM Xmx of 512mb each.  Clearly this implies a fully utilised
>>>> node will use 8*512mb + 8*512mb = 8GB of memory on tasks alone. That's
>>>> before the datanode does anything, or HBase for that matter.
>>>>
>>>> As such, I've dropped it to 4 maps, 4 reduces per node and reduced the
>>>> Xmx to 256mb, giving a potential maximum task overhead of 2GB per
>>>> node. Running 'vmstat 20' now, under load from mapreduce jobs,
>>>> suggests that the actual free memory is about the same, but the memory
>>>> cache is much much bigger, which presumably is healthlier as, in
>>>> theory, that ought to relinquish memory to processes that request it.
>>>>
>>>> Lets see if that does the trick!
>>>>
>>>> ta
>>>>
>>>> Jamie
>>>>
>>>>
>>>> On 7 July 2010 19:30, Jean-Daniel Cryans <jdcryans@apache.org> wrote:
>>>>> YouAreDead means that the region server's session was expired, GC
>>>>> seems like your major problem. (file problems can happen after a GC
>>>>> sleep because they were moved around while the process was sleeping,
>>>>> you also get the same kind of messages with xcievers issue... sorry
>>>>> for the confusion)
>>>>>
>>>>> By over committing the memory I meant trying to fit too much stuff in
>>>>> the amount of RAM that you have. I guess it's the map and reduce tasks
>>>>> that eat all the free space? Why not lower their number?
>>>>>
>>>>> J-D
>>>>>
>>>>> On Wed, Jul 7, 2010 at 11:22 AM, Jamie Cockrill
>>>>> <jamie.cockrill@gmail.com> wrote:
>>>>>> PS, I've now reset my MAX_FILESIZE back to the default.  (from the
1GB
>>>>>> i raised it to). It caused me to run into a delightful
>>>>>> 'YouAreDeadException' which looks very related to the Garbage
>>>>>> collection issues on the Troubleshooting page, as my Zookeeper session
>>>>>> expired.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Jamie
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 7 July 2010 19:19, Jamie Cockrill <jamie.cockrill@gmail.com>
wrote:
>>>>>>> By overcommit, do you mean make my overcommit_ratio higher on
each box
>>>>>>> (its at the default 50 at the moment)? What I'm noticing at the
moment
>>>>>>> is that hadoop is taking up the vast majority of the memory on
the
>>>>>>> boxes.
>>>>>>>
>>>>>>> I found this article:
>>>>>>> http://blog.rapleaf.com/dev/2010/01/05/the-wrath-of-drwho-or-unpredictable-hadoop-memory-usage/
>>>>>>> which Todd, it looks like you replied to. Does this sound like
a
>>>>>>> similar problem? No worries if you can't remember, it was back
in
>>>>>>> january! This article suggests reducing the amount of memory
allocated
>>>>>>> to Hadoop at startup, how would I go about doing this?
>>>>>>>
>>>>>>> Thank you everyone for your patience so far. Sorry if this is
taking
>>>>>>> up a lot of your time.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Jamie
>>>>>>>
>>>>>>> On 7 July 2010 19:03, Jean-Daniel Cryans <jdcryans@apache.org>
wrote:
>>>>>>>> swappinness at 0 is good, but also don't overcommit your
memory!
>>>>>>>>
>>>>>>>> J-D
>>>>>>>>
>>>>>>>> On Wed, Jul 7, 2010 at 10:53 AM, Jamie Cockrill
>>>>>>>> <jamie.cockrill@gmail.com> wrote:
>>>>>>>>> I think you're right.
>>>>>>>>>
>>>>>>>>> Unfortunately the machines are on a separate network
to this laptop,
>>>>>>>>> so I'm having to type everything across, apologies if
it doesn't
>>>>>>>>> translate well...
>>>>>>>>>
>>>>>>>>> free -m gave:
>>>>>>>>>
>>>>>>>>> Mem    Total    Used     Free
>>>>>>>>>            7992     7939      53
>>>>>>>>> b/c                    7877    114
>>>>>>>>> Swap: 23415       895  22519
>>>>>>>>>
>>>>>>>>> I did this on another node that isn't being smashed at
the moment and
>>>>>>>>> the numbers came out similar, but the buffers/cache free
was higher
>>>>>>>>>
>>>>>>>>> vmstat -20 is giving non-zero si and so's ranging between
3 and just
>>>>>>>>> short of 5000.
>>>>>>>>>
>>>>>>>>> That seems to be it I guess. Hadoop troubleshooting suggests
setting
>>>>>>>>> swappiness to 0, is that just a case of changing the
value in
>>>>>>>>> /proc/sys/vm/swappiness?
>>>>>>>>>
>>>>>>>>> thanks
>>>>>>>>>
>>>>>>>>> Jamie
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 7 July 2010 18:40, Todd Lipcon <todd@cloudera.com>
wrote:
>>>>>>>>>> On Wed, Jul 7, 2010 at 10:32 AM, Jamie Cockrill <jamie.cockrill@gmail.com>wrote:
>>>>>>>>>>
>>>>>>>>>>> On the subject of GC and heap, I've left those
as defaults. I could
>>>>>>>>>>> look at those if that's the next logical step?
Would there be anything
>>>>>>>>>>> in any of the logs that I should look at?
>>>>>>>>>>>
>>>>>>>>>>> One thing I have noticed is that it does take
an absolute age to log
>>>>>>>>>>> in to the DN/RS to restart the RS once it's fallen
over, in one
>>>>>>>>>>> instance it took about 10 minutes. These are
8GB, 4 core amd64 boxes
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> That indicates swapping. Can you run "free -m" on
the node?
>>>>>>>>>>
>>>>>>>>>> Also let "vmstat 20" run while running your job and
observe the "si" and
>>>>>>>>>> "so" columns. If those are nonzero, it indicates
you're swapping, and you've
>>>>>>>>>> oversubscribed your RAM (very easy on 8G machines)
>>>>>>>>>>
>>>>>>>>>> -Todd
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> ta
>>>>>>>>>>>
>>>>>>>>>>> Jamie
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 7 July 2010 18:30, Jamie Cockrill <jamie.cockrill@gmail.com>
wrote:
>>>>>>>>>>> > Bad news, it looks like my xcievers is set
as it should be, it's in
>>>>>>>>>>> > the hdfs-site.xml and looking at the job.xml
of one of my jobs in the
>>>>>>>>>>> > job-tracker, it's showing that property
as set to 2047. I've cat |
>>>>>>>>>>> > grepped one of the datanode logs and although
there were a few in
>>>>>>>>>>> > there, they were from a few months ago.
I've upped my MAX_FILESIZE on
>>>>>>>>>>> > my table to 1GB to see if that helps (not
sure if it will!).
>>>>>>>>>>> >
>>>>>>>>>>> > Thanks,
>>>>>>>>>>> >
>>>>>>>>>>> > Jamie
>>>>>>>>>>> >
>>>>>>>>>>> > On 7 July 2010 18:12, Jean-Daniel Cryans
<jdcryans@apache.org> wrote:
>>>>>>>>>>> >> xcievers exceptions will be in the datanodes'
logs, and your problem
>>>>>>>>>>> >> totally looks like it. 0.20.5 will have
the same issue (since it's on
>>>>>>>>>>> >> the HDFS side)
>>>>>>>>>>> >>
>>>>>>>>>>> >> J-D
>>>>>>>>>>> >>
>>>>>>>>>>> >> On Wed, Jul 7, 2010 at 10:08 AM, Jamie
Cockrill
>>>>>>>>>>> >> <jamie.cockrill@gmail.com> wrote:
>>>>>>>>>>> >>> Hi Todd & JD,
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Environment:
>>>>>>>>>>> >>> All (hadoop and HBase) installed
as of karmic-cdh3, which means:
>>>>>>>>>>> >>> Hadoop 0.20.2+228
>>>>>>>>>>> >>> HBase 0.89.20100621+17
>>>>>>>>>>> >>> Zookeeper 3.3.1+7
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Unfortunately my whole cluster of
regionservers have now crashed, so I
>>>>>>>>>>> >>> can't really say if it was swapping
too much. There is a DEBUG
>>>>>>>>>>> >>> statement just before it crashes
saying:
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> org.apache.hadoop.hbase.regionserver.wal.HLog:
closing hlog writer in
>>>>>>>>>>> >>> hdfs://<somewhere on my HDFS,
in /hbase>
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> What follows is:
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> WARN org.apache.hadoop.hdfs.DFSClient:
DataStreamer Exception:
>>>>>>>>>>> >>> org.apache.hadoop.ipc.RemoteException:
>>>>>>>>>>> >>> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
No lease
>>>>>>>>>>> >>> on <file location as above>
File does not exist. Holder
>>>>>>>>>>> >>> DFSClient_-11113603 does not have
any open files
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> It then seems to try and do some
error recovery (Error Recovery for
>>>>>>>>>>> >>> block null bad datanode[0] nodes
== null), fails (Could not get block
>>>>>>>>>>> >>> locations. Source file "<hbase
file as before>" - Aborting). There is
>>>>>>>>>>> >>> then an ERROR org.apache...HRegionServer:
Close and delete failed.
>>>>>>>>>>> >>> There is then a similar LeaseExpiredException
as above.
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> There are then a couple of messages
from HRegionServer saying that
>>>>>>>>>>> >>> it's notifying master of its shutdown
and stopping itself. The
>>>>>>>>>>> >>> shutdown hook then fires and the
RemoteException and
>>>>>>>>>>> >>> LeaseExpiredExceptions are printed
again.
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> ulimit is set to 65000 (it's in
the regionserver log, printed as I
>>>>>>>>>>> >>> restarted the regionserver), however
I haven't got the xceivers set
>>>>>>>>>>> >>> anywhere. I'll give that a go. It
does seem very odd as I did have a
>>>>>>>>>>> >>> few of them fall over one at a time
with a few early loads, but that
>>>>>>>>>>> >>> seemed to be because the regions
weren't splitting properly, so all
>>>>>>>>>>> >>> the traffic was going to one node
and it was being overwhelmed. Once I
>>>>>>>>>>> >>> throttled it, after one load it
a region split seemed to get
>>>>>>>>>>> >>> triggered, which flung regions all
over, which made subsequent loads
>>>>>>>>>>> >>> much more distributed. However,
perhaps the time-bomb was ticking...
>>>>>>>>>>> >>> I'll  have a go at specifying the
xcievers property. I'm pretty
>>>>>>>>>>> >>> certain i've got everything else
covered, except the patches as
>>>>>>>>>>> >>> referenced in the JIRA.
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> I just grepped some of the log files
and didn't get an explicit
>>>>>>>>>>> >>> exception with 'xciever' in it.
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> I am considering downgrading(?)
to 0.20.5, however because everything
>>>>>>>>>>> >>> is installed as per karmic-cdh3,
I'm a bit reluctant to do so as
>>>>>>>>>>> >>> presumably Cloudera has tested each
of these versions against each
>>>>>>>>>>> >>> other? And I don't really want to
introduce further versioning issues.
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Thanks,
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Jamie
>>>>>>>>>>> >>>
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> On 7 July 2010 17:30, Jean-Daniel
Cryans <jdcryans@apache.org> wrote:
>>>>>>>>>>> >>>> Jamie,
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> Does your configuration meets
the requirements?
>>>>>>>>>>> >>>>
>>>>>>>>>>> http://hbase.apache.org/docs/r0.20.5/api/overview-summary.html#requirements
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> ulimit and xcievers, if not
set, are usually time bombs that blow off
>>>>>>>>>>> when
>>>>>>>>>>> >>>> the cluster is under load.
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> J-D
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>> On Wed, Jul 7, 2010 at 9:11
AM, Jamie Cockrill <
>>>>>>>>>>> jamie.cockrill@gmail.com>wrote:
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>>> Dear all,
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> My current HBase/Hadoop
architecture has HBase region servers on the
>>>>>>>>>>> >>>>> same physical boxes as the
HDFS data-nodes. I'm getting an awful lot
>>>>>>>>>>> >>>>> of region server crashes.
The last thing that happens appears to be a
>>>>>>>>>>> >>>>> DroppedSnapshot Exception,
caused by an IOException: could not
>>>>>>>>>>> >>>>> complete write to file <file
on HDFS>. I am running it under load,
>>>>>>>>>>> how
>>>>>>>>>>> >>>>> heavy that is I'm not sure
how that is quantified, but I'm guessing
>>>>>>>>>>> it
>>>>>>>>>>> >>>>> is a load issue.
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> Is it common practice to
put region servers on data-nodes? Is it
>>>>>>>>>>> >>>>> common to see region server
crashes when either the HDFS or region
>>>>>>>>>>> >>>>> server (or both) is under
heavy load? I'm guessing that is the case
>>>>>>>>>>> as
>>>>>>>>>>> >>>>> I've seen a few similar
posts. I've not got a great deal of capacity
>>>>>>>>>>> >>>>> to be separating region
servers from HDFS data nodes, but it might be
>>>>>>>>>>> >>>>> an argument I could make.
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> Thanks
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>> Jamie
>>>>>>>>>>> >>>>>
>>>>>>>>>>> >>>>
>>>>>>>>>>> >>>
>>>>>>>>>>> >>
>>>>>>>>>>> >
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Todd Lipcon
>>>>>>>>>> Software Engineer, Cloudera
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message