hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: HBase on same boxes as HDFS Data nodes
Date Thu, 08 Jul 2010 17:11:00 GMT
This would be done at the expense of network IO, since you will lose
locality for jobs that read/write to HBase. Also I guess the datanodes
are also there, so HBase will lose locality with HDFS.

J-D

On Thu, Jul 8, 2010 at 10:07 AM, Jamie Cockrill
<jamie.cockrill@gmail.com> wrote:
> Thanks all for your help with this, everything seems much more stable
> for the meantime. I have a backlog loading job to run over a great
> deal of data, so I might separate out my region servers from my task
> trackers for the meantime.
>
> Thanks again,
>
> Jamie
>
>
>
> On 8 July 2010 17:46, Jean-Daniel Cryans <jdcryans@apache.org> wrote:
>> OS cache is good, glad you figured out your memory problem.
>>
>> J-D
>>
>> On Thu, Jul 8, 2010 at 2:03 AM, Jamie Cockrill <jamie.cockrill@gmail.com> wrote:
>>> Morning all. Day 2 begins...
>>>
>>> I discussed this with someone else earlier and they pointed out that
>>> we also have task trackers running on all of those nodes, which will
>>> affect the amount of memory being used when jobs are being run. Each
>>> tasktracker had a maximum of 8 maps and 8 reduces configured per node,
>>> with a JVM Xmx of 512mb each.  Clearly this implies a fully utilised
>>> node will use 8*512mb + 8*512mb = 8GB of memory on tasks alone. That's
>>> before the datanode does anything, or HBase for that matter.
>>>
>>> As such, I've dropped it to 4 maps, 4 reduces per node and reduced the
>>> Xmx to 256mb, giving a potential maximum task overhead of 2GB per
>>> node. Running 'vmstat 20' now, under load from mapreduce jobs,
>>> suggests that the actual free memory is about the same, but the memory
>>> cache is much much bigger, which presumably is healthlier as, in
>>> theory, that ought to relinquish memory to processes that request it.
>>>
>>> Lets see if that does the trick!
>>>
>>> ta
>>>
>>> Jamie
>>>
>>>
>>> On 7 July 2010 19:30, Jean-Daniel Cryans <jdcryans@apache.org> wrote:
>>>> YouAreDead means that the region server's session was expired, GC
>>>> seems like your major problem. (file problems can happen after a GC
>>>> sleep because they were moved around while the process was sleeping,
>>>> you also get the same kind of messages with xcievers issue... sorry
>>>> for the confusion)
>>>>
>>>> By over committing the memory I meant trying to fit too much stuff in
>>>> the amount of RAM that you have. I guess it's the map and reduce tasks
>>>> that eat all the free space? Why not lower their number?
>>>>
>>>> J-D
>>>>
>>>> On Wed, Jul 7, 2010 at 11:22 AM, Jamie Cockrill
>>>> <jamie.cockrill@gmail.com> wrote:
>>>>> PS, I've now reset my MAX_FILESIZE back to the default.  (from the 1GB
>>>>> i raised it to). It caused me to run into a delightful
>>>>> 'YouAreDeadException' which looks very related to the Garbage
>>>>> collection issues on the Troubleshooting page, as my Zookeeper session
>>>>> expired.
>>>>>
>>>>> Thanks
>>>>>
>>>>> Jamie
>>>>>
>>>>>
>>>>>
>>>>> On 7 July 2010 19:19, Jamie Cockrill <jamie.cockrill@gmail.com>
wrote:
>>>>>> By overcommit, do you mean make my overcommit_ratio higher on each
box
>>>>>> (its at the default 50 at the moment)? What I'm noticing at the moment
>>>>>> is that hadoop is taking up the vast majority of the memory on the
>>>>>> boxes.
>>>>>>
>>>>>> I found this article:
>>>>>> http://blog.rapleaf.com/dev/2010/01/05/the-wrath-of-drwho-or-unpredictable-hadoop-memory-usage/
>>>>>> which Todd, it looks like you replied to. Does this sound like a
>>>>>> similar problem? No worries if you can't remember, it was back in
>>>>>> january! This article suggests reducing the amount of memory allocated
>>>>>> to Hadoop at startup, how would I go about doing this?
>>>>>>
>>>>>> Thank you everyone for your patience so far. Sorry if this is taking
>>>>>> up a lot of your time.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Jamie
>>>>>>
>>>>>> On 7 July 2010 19:03, Jean-Daniel Cryans <jdcryans@apache.org>
wrote:
>>>>>>> swappinness at 0 is good, but also don't overcommit your memory!
>>>>>>>
>>>>>>> J-D
>>>>>>>
>>>>>>> On Wed, Jul 7, 2010 at 10:53 AM, Jamie Cockrill
>>>>>>> <jamie.cockrill@gmail.com> wrote:
>>>>>>>> I think you're right.
>>>>>>>>
>>>>>>>> Unfortunately the machines are on a separate network to this
laptop,
>>>>>>>> so I'm having to type everything across, apologies if it
doesn't
>>>>>>>> translate well...
>>>>>>>>
>>>>>>>> free -m gave:
>>>>>>>>
>>>>>>>> Mem    Total    Used     Free
>>>>>>>>            7992     7939      53
>>>>>>>> b/c                    7877    114
>>>>>>>> Swap: 23415       895  22519
>>>>>>>>
>>>>>>>> I did this on another node that isn't being smashed at the
moment and
>>>>>>>> the numbers came out similar, but the buffers/cache free
was higher
>>>>>>>>
>>>>>>>> vmstat -20 is giving non-zero si and so's ranging between
3 and just
>>>>>>>> short of 5000.
>>>>>>>>
>>>>>>>> That seems to be it I guess. Hadoop troubleshooting suggests
setting
>>>>>>>> swappiness to 0, is that just a case of changing the value
in
>>>>>>>> /proc/sys/vm/swappiness?
>>>>>>>>
>>>>>>>> thanks
>>>>>>>>
>>>>>>>> Jamie
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 7 July 2010 18:40, Todd Lipcon <todd@cloudera.com>
wrote:
>>>>>>>>> On Wed, Jul 7, 2010 at 10:32 AM, Jamie Cockrill <jamie.cockrill@gmail.com>wrote:
>>>>>>>>>
>>>>>>>>>> On the subject of GC and heap, I've left those as
defaults. I could
>>>>>>>>>> look at those if that's the next logical step? Would
there be anything
>>>>>>>>>> in any of the logs that I should look at?
>>>>>>>>>>
>>>>>>>>>> One thing I have noticed is that it does take an
absolute age to log
>>>>>>>>>> in to the DN/RS to restart the RS once it's fallen
over, in one
>>>>>>>>>> instance it took about 10 minutes. These are 8GB,
4 core amd64 boxes
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> That indicates swapping. Can you run "free -m" on the
node?
>>>>>>>>>
>>>>>>>>> Also let "vmstat 20" run while running your job and observe
the "si" and
>>>>>>>>> "so" columns. If those are nonzero, it indicates you're
swapping, and you've
>>>>>>>>> oversubscribed your RAM (very easy on 8G machines)
>>>>>>>>>
>>>>>>>>> -Todd
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> ta
>>>>>>>>>>
>>>>>>>>>> Jamie
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 7 July 2010 18:30, Jamie Cockrill <jamie.cockrill@gmail.com>
wrote:
>>>>>>>>>> > Bad news, it looks like my xcievers is set as
it should be, it's in
>>>>>>>>>> > the hdfs-site.xml and looking at the job.xml
of one of my jobs in the
>>>>>>>>>> > job-tracker, it's showing that property as set
to 2047. I've cat |
>>>>>>>>>> > grepped one of the datanode logs and although
there were a few in
>>>>>>>>>> > there, they were from a few months ago. I've
upped my MAX_FILESIZE on
>>>>>>>>>> > my table to 1GB to see if that helps (not sure
if it will!).
>>>>>>>>>> >
>>>>>>>>>> > Thanks,
>>>>>>>>>> >
>>>>>>>>>> > Jamie
>>>>>>>>>> >
>>>>>>>>>> > On 7 July 2010 18:12, Jean-Daniel Cryans <jdcryans@apache.org>
wrote:
>>>>>>>>>> >> xcievers exceptions will be in the datanodes'
logs, and your problem
>>>>>>>>>> >> totally looks like it. 0.20.5 will have
the same issue (since it's on
>>>>>>>>>> >> the HDFS side)
>>>>>>>>>> >>
>>>>>>>>>> >> J-D
>>>>>>>>>> >>
>>>>>>>>>> >> On Wed, Jul 7, 2010 at 10:08 AM, Jamie Cockrill
>>>>>>>>>> >> <jamie.cockrill@gmail.com> wrote:
>>>>>>>>>> >>> Hi Todd & JD,
>>>>>>>>>> >>>
>>>>>>>>>> >>> Environment:
>>>>>>>>>> >>> All (hadoop and HBase) installed as
of karmic-cdh3, which means:
>>>>>>>>>> >>> Hadoop 0.20.2+228
>>>>>>>>>> >>> HBase 0.89.20100621+17
>>>>>>>>>> >>> Zookeeper 3.3.1+7
>>>>>>>>>> >>>
>>>>>>>>>> >>> Unfortunately my whole cluster of regionservers
have now crashed, so I
>>>>>>>>>> >>> can't really say if it was swapping
too much. There is a DEBUG
>>>>>>>>>> >>> statement just before it crashes saying:
>>>>>>>>>> >>>
>>>>>>>>>> >>> org.apache.hadoop.hbase.regionserver.wal.HLog:
closing hlog writer in
>>>>>>>>>> >>> hdfs://<somewhere on my HDFS, in
/hbase>
>>>>>>>>>> >>>
>>>>>>>>>> >>> What follows is:
>>>>>>>>>> >>>
>>>>>>>>>> >>> WARN org.apache.hadoop.hdfs.DFSClient:
DataStreamer Exception:
>>>>>>>>>> >>> org.apache.hadoop.ipc.RemoteException:
>>>>>>>>>> >>> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
No lease
>>>>>>>>>> >>> on <file location as above> File
does not exist. Holder
>>>>>>>>>> >>> DFSClient_-11113603 does not have any
open files
>>>>>>>>>> >>>
>>>>>>>>>> >>> It then seems to try and do some error
recovery (Error Recovery for
>>>>>>>>>> >>> block null bad datanode[0] nodes ==
null), fails (Could not get block
>>>>>>>>>> >>> locations. Source file "<hbase file
as before>" - Aborting). There is
>>>>>>>>>> >>> then an ERROR org.apache...HRegionServer:
Close and delete failed.
>>>>>>>>>> >>> There is then a similar LeaseExpiredException
as above.
>>>>>>>>>> >>>
>>>>>>>>>> >>> There are then a couple of messages
from HRegionServer saying that
>>>>>>>>>> >>> it's notifying master of its shutdown
and stopping itself. The
>>>>>>>>>> >>> shutdown hook then fires and the RemoteException
and
>>>>>>>>>> >>> LeaseExpiredExceptions are printed again.
>>>>>>>>>> >>>
>>>>>>>>>> >>> ulimit is set to 65000 (it's in the
regionserver log, printed as I
>>>>>>>>>> >>> restarted the regionserver), however
I haven't got the xceivers set
>>>>>>>>>> >>> anywhere. I'll give that a go. It does
seem very odd as I did have a
>>>>>>>>>> >>> few of them fall over one at a time
with a few early loads, but that
>>>>>>>>>> >>> seemed to be because the regions weren't
splitting properly, so all
>>>>>>>>>> >>> the traffic was going to one node and
it was being overwhelmed. Once I
>>>>>>>>>> >>> throttled it, after one load it a region
split seemed to get
>>>>>>>>>> >>> triggered, which flung regions all over,
which made subsequent loads
>>>>>>>>>> >>> much more distributed. However, perhaps
the time-bomb was ticking...
>>>>>>>>>> >>> I'll  have a go at specifying the xcievers
property. I'm pretty
>>>>>>>>>> >>> certain i've got everything else covered,
except the patches as
>>>>>>>>>> >>> referenced in the JIRA.
>>>>>>>>>> >>>
>>>>>>>>>> >>> I just grepped some of the log files
and didn't get an explicit
>>>>>>>>>> >>> exception with 'xciever' in it.
>>>>>>>>>> >>>
>>>>>>>>>> >>> I am considering downgrading(?) to 0.20.5,
however because everything
>>>>>>>>>> >>> is installed as per karmic-cdh3, I'm
a bit reluctant to do so as
>>>>>>>>>> >>> presumably Cloudera has tested each
of these versions against each
>>>>>>>>>> >>> other? And I don't really want to introduce
further versioning issues.
>>>>>>>>>> >>>
>>>>>>>>>> >>> Thanks,
>>>>>>>>>> >>>
>>>>>>>>>> >>> Jamie
>>>>>>>>>> >>>
>>>>>>>>>> >>>
>>>>>>>>>> >>> On 7 July 2010 17:30, Jean-Daniel Cryans
<jdcryans@apache.org> wrote:
>>>>>>>>>> >>>> Jamie,
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> Does your configuration meets the
requirements?
>>>>>>>>>> >>>>
>>>>>>>>>> http://hbase.apache.org/docs/r0.20.5/api/overview-summary.html#requirements
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> ulimit and xcievers, if not set,
are usually time bombs that blow off
>>>>>>>>>> when
>>>>>>>>>> >>>> the cluster is under load.
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> J-D
>>>>>>>>>> >>>>
>>>>>>>>>> >>>> On Wed, Jul 7, 2010 at 9:11 AM,
Jamie Cockrill <
>>>>>>>>>> jamie.cockrill@gmail.com>wrote:
>>>>>>>>>> >>>>
>>>>>>>>>> >>>>> Dear all,
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> My current HBase/Hadoop architecture
has HBase region servers on the
>>>>>>>>>> >>>>> same physical boxes as the HDFS
data-nodes. I'm getting an awful lot
>>>>>>>>>> >>>>> of region server crashes. The
last thing that happens appears to be a
>>>>>>>>>> >>>>> DroppedSnapshot Exception, caused
by an IOException: could not
>>>>>>>>>> >>>>> complete write to file <file
on HDFS>. I am running it under load,
>>>>>>>>>> how
>>>>>>>>>> >>>>> heavy that is I'm not sure how
that is quantified, but I'm guessing
>>>>>>>>>> it
>>>>>>>>>> >>>>> is a load issue.
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> Is it common practice to put
region servers on data-nodes? Is it
>>>>>>>>>> >>>>> common to see region server
crashes when either the HDFS or region
>>>>>>>>>> >>>>> server (or both) is under heavy
load? I'm guessing that is the case
>>>>>>>>>> as
>>>>>>>>>> >>>>> I've seen a few similar posts.
I've not got a great deal of capacity
>>>>>>>>>> >>>>> to be separating region servers
from HDFS data nodes, but it might be
>>>>>>>>>> >>>>> an argument I could make.
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> Thanks
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>> Jamie
>>>>>>>>>> >>>>>
>>>>>>>>>> >>>>
>>>>>>>>>> >>>
>>>>>>>>>> >>
>>>>>>>>>> >
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Todd Lipcon
>>>>>>>>> Software Engineer, Cloudera
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message