hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From vramanatha...@aol.com
Subject Re: HBase on same boxes as HDFS Data nodes
Date Thu, 08 Jul 2010 17:51:04 GMT

 Thankyou..
I've some more questions
I'm spending quite a bit over last few weeks to develop one of our applications using HBase/Hadoop
& using 0.20.4

Hbase - Table X
rows - 1- 100 -> Region A -> RegionServer A     --> DataNode A
....
rows  1500 - 1600 -> Region M -> RegionServer B -> DataNode B

So based on what I have read so far..I'm thinking of Region Server A & Data Node A pairs
on the same host to 
make use of locality..

As per your answer ..If we restart the cluster, because of radom assigment, locality is gone

so..Region Server B -..> Region A ---> data blocks will be in Data Node A
...if I understand correctly..
will the data move over time though...for example if i have lots of access to data in DataNode
A ? without the current work that is in progress..

thanks again for your reply

venkatesh

 


 

 

-----Original Message-----
From: Jean-Daniel Cryans <jdcryans@apache.org>
To: user@hbase.apache.org
Sent: Thu, Jul 8, 2010 1:35 pm
Subject: Re: HBase on same boxes as HDFS Data nodes


Former, " Now imagine you stop HBase after saving a lot of data and

restarting it subsequently. The region servers are restarted and

assign a seemingly random number of regions"



It's not really because we enjoy it that way, but because the work

required just isn't done. If this is of interest to you, Jonathan and

Karthik at Facebook started rewriting our load balancer. See

https://issues.apache.org/jira/browse/HBASE-2699 and

https://issues.apache.org/jira/browse/HBASE-2480



J-D



On Thu, Jul 8, 2010 at 10:30 AM,  <vramanathan00@aol.com> wrote:

>

>  Hi

> Fairly new to hbase..& the list serve..Following up on this thread & the 

article..

> Could some one elaborate why locality is lost upon restart? Is it because

> of random assignment by HMaster and/or HRegionServer is stateless or other 

reasons?

>

> thanks

> venkatesh

>

>

>

>

>

>

>

>

>

>

> -----Original Message-----

> From: Jean-Daniel Cryans <jdcryans@apache.org>

> To: user@hbase.apache.org

> Sent: Thu, Jul 8, 2010 1:11 pm

> Subject: Re: HBase on same boxes as HDFS Data nodes

>

>

> More info on this blog post:

>

> http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html

>

>

>

> J-D

>

>

>

> On Thu, Jul 8, 2010 at 10:11 AM, Jean-Daniel Cryans <jdcryans@apache.org> 

wrote:

>

>> This would be done at the expense of network IO, since you will lose

>

>> locality for jobs that read/write to HBase. Also I guess the datanodes

>

>> are also there, so HBase will lose locality with HDFS.

>

>>

>

>> J-D

>

>>

>

>> On Thu, Jul 8, 2010 at 10:07 AM, Jamie Cockrill

>

>> <jamie.cockrill@gmail.com> wrote:

>

>>> Thanks all for your help with this, everything seems much more stable

>

>>> for the meantime. I have a backlog loading job to run over a great

>

>>> deal of data, so I might separate out my region servers from my task

>

>>> trackers for the meantime.

>

>>>

>

>>> Thanks again,

>

>>>

>

>>> Jamie

>

>>>

>

>>>

>

>>>

>

>>> On 8 July 2010 17:46, Jean-Daniel Cryans <jdcryans@apache.org> wrote:

>

>>>> OS cache is good, glad you figured out your memory problem.

>

>>>>

>

>>>> J-D

>

>>>>

>

>>>> On Thu, Jul 8, 2010 at 2:03 AM, Jamie Cockrill <jamie.cockrill@gmail.com>

>

> wrote:

>

>>>>> Morning all. Day 2 begins...

>

>>>>>

>

>>>>> I discussed this with someone else earlier and they pointed out that

>

>>>>> we also have task trackers running on all of those nodes, which will

>

>>>>> affect the amount of memory being used when jobs are being run. Each

>

>>>>> tasktracker had a maximum of 8 maps and 8 reduces configured per node,

>

>>>>> with a JVM Xmx of 512mb each.  Clearly this implies a fully utilised

>

>>>>> node will use 8*512mb + 8*512mb = 8GB of memory on tasks alone. That's

>

>>>>> before the datanode does anything, or HBase for that matter.

>

>>>>>

>

>>>>> As such, I've dropped it to 4 maps, 4 reduces per node and reduced the

>

>>>>> Xmx to 256mb, giving a potential maximum task overhead of 2GB per

>

>>>>> node. Running 'vmstat 20' now, under load from mapreduce jobs,

>

>>>>> suggests that the actual free memory is about the same, but the memory

>

>>>>> cache is much much bigger, which presumably is healthlier as, in

>

>>>>> theory, that ought to relinquish memory to processes that request it.

>

>>>>>

>

>>>>> Lets see if that does the trick!

>

>>>>>

>

>>>>> ta

>

>>>>>

>

>>>>> Jamie

>

>>>>>

>

>>>>>

>

>>>>> On 7 July 2010 19:30, Jean-Daniel Cryans <jdcryans@apache.org>
wrote:

>

>>>>>> YouAreDead means that the region server's session was expired, GC

>

>>>>>> seems like your major problem. (file problems can happen after a
GC

>

>>>>>> sleep because they were moved around while the process was sleeping,

>

>>>>>> you also get the same kind of messages with xcievers issue... sorry

>

>>>>>> for the confusion)

>

>>>>>>

>

>>>>>> By over committing the memory I meant trying to fit too much stuff
in

>

>>>>>> the amount of RAM that you have. I guess it's the map and reduce
tasks

>

>>>>>> that eat all the free space? Why not lower their number?

>

>>>>>>

>

>>>>>> J-D

>

>>>>>>

>

>>>>>> On Wed, Jul 7, 2010 at 11:22 AM, Jamie Cockrill

>

>>>>>> <jamie.cockrill@gmail.com> wrote:

>

>>>>>>> PS, I've now reset my MAX_FILESIZE back to the default.  (from
the 1GB

>

>>>>>>> i raised it to). It caused me to run into a delightful

>

>>>>>>> 'YouAreDeadException' which looks very related to the Garbage

>

>>>>>>> collection issues on the Troubleshooting page, as my Zookeeper
session

>

>>>>>>> expired.

>

>>>>>>>

>

>>>>>>> Thanks

>

>>>>>>>

>

>>>>>>> Jamie

>

>>>>>>>

>

>>>>>>>

>

>>>>>>>

>

>>>>>>> On 7 July 2010 19:19, Jamie Cockrill <jamie.cockrill@gmail.com>
wrote:

>

>>>>>>>> By overcommit, do you mean make my overcommit_ratio higher
on each box

>

>>>>>>>> (its at the default 50 at the moment)? What I'm noticing
at the moment

>

>>>>>>>> is that hadoop is taking up the vast majority of the memory
on the

>

>>>>>>>> boxes.

>

>>>>>>>>

>

>>>>>>>> I found this article:

>

>>>>>>>> http://blog.rapleaf.com/dev/2010/01/05/the-wrath-of-drwho-or-unpredictable-hadoop-memory-usage/

>

>>>>>>>> which Todd, it looks like you replied to. Does this sound
like a

>

>>>>>>>> similar problem? No worries if you can't remember, it was
back in

>

>>>>>>>> january! This article suggests reducing the amount of memory
allocated

>

>>>>>>>> to Hadoop at startup, how would I go about doing this?

>

>>>>>>>>

>

>>>>>>>> Thank you everyone for your patience so far. Sorry if this
is taking

>

>>>>>>>> up a lot of your time.

>

>>>>>>>>

>

>>>>>>>> Thanks,

>

>>>>>>>>

>

>>>>>>>> Jamie

>

>>>>>>>>

>

>>>>>>>> On 7 July 2010 19:03, Jean-Daniel Cryans <jdcryans@apache.org>
wrote:

>

>>>>>>>>> swappinness at 0 is good, but also don't overcommit your
memory!

>

>>>>>>>>>

>

>>>>>>>>> J-D

>

>>>>>>>>>

>

>>>>>>>>> On Wed, Jul 7, 2010 at 10:53 AM, Jamie Cockrill

>

>>>>>>>>> <jamie.cockrill@gmail.com> wrote:

>

>>>>>>>>>> I think you're right.

>

>>>>>>>>>>

>

>>>>>>>>>> Unfortunately the machines are on a separate network
to this laptop,

>

>>>>>>>>>> so I'm having to type everything across, apologies
if it doesn't

>

>>>>>>>>>> translate well...

>

>>>>>>>>>>

>

>>>>>>>>>> free -m gave:

>

>>>>>>>>>>

>

>>>>>>>>>> Mem    Total    Used     Free

>

>>>>>>>>>>            7992     7939      53

>

>>>>>>>>>> b/c                    7877    114

>

>>>>>>>>>> Swap: 23415       895  22519

>

>>>>>>>>>>

>

>>>>>>>>>> I did this on another node that isn't being smashed
at the moment and

>

>>>>>>>>>> the numbers came out similar, but the buffers/cache
free was higher

>

>>>>>>>>>>

>

>>>>>>>>>> vmstat -20 is giving non-zero si and so's ranging
between 3 and just

>

>>>>>>>>>> short of 5000.

>

>>>>>>>>>>

>

>>>>>>>>>> That seems to be it I guess. Hadoop troubleshooting
suggests setting

>

>>>>>>>>>> swappiness to 0, is that just a case of changing
the value in

>

>>>>>>>>>> /proc/sys/vm/swappiness?

>

>>>>>>>>>>

>

>>>>>>>>>> thanks

>

>>>>>>>>>>

>

>>>>>>>>>> Jamie

>

>>>>>>>>>>

>

>>>>>>>>>>

>

>>>>>>>>>>

>

>>>>>>>>>>

>

>>>>>>>>>> On 7 July 2010 18:40, Todd Lipcon <todd@cloudera.com>
wrote:

>

>>>>>>>>>>> On Wed, Jul 7, 2010 at 10:32 AM, Jamie Cockrill
<jamie.cockrill@gmail.com>wrote:

>

>>>>>>>>>>>

>

>>>>>>>>>>>> On the subject of GC and heap, I've left
those as defaults. I could

>

>>>>>>>>>>>> look at those if that's the next logical
step? Would there be

>

> anything

>

>>>>>>>>>>>> in any of the logs that I should look at?

>

>>>>>>>>>>>>

>

>>>>>>>>>>>> One thing I have noticed is that it does
take an absolute age to 

log

>

>>>>>>>>>>>> in to the DN/RS to restart the RS once it's
fallen over, in one

>

>>>>>>>>>>>> instance it took about 10 minutes. These
are 8GB, 4 core amd64 

boxes

>

>>>>>>>>>>>>

>

>>>>>>>>>>>>

>

>>>>>>>>>>> That indicates swapping. Can you run "free -m"
on the node?

>

>>>>>>>>>>>

>

>>>>>>>>>>> Also let "vmstat 20" run while running your job
and observe the "si"

>

> and

>

>>>>>>>>>>> "so" columns. If those are nonzero, it indicates
you're swapping, 

and

>

> you've

>

>>>>>>>>>>> oversubscribed your RAM (very easy on 8G machines)

>

>>>>>>>>>>>

>

>>>>>>>>>>> -Todd

>

>>>>>>>>>>>

>

>>>>>>>>>>>

>

>>>>>>>>>>>

>

>>>>>>>>>>>> ta

>

>>>>>>>>>>>>

>

>>>>>>>>>>>> Jamie

>

>>>>>>>>>>>>

>

>>>>>>>>>>>>

>

>>>>>>>>>>>>

>

>>>>>>>>>>>> On 7 July 2010 18:30, Jamie Cockrill <jamie.cockrill@gmail.com>

>

> wrote:

>

>>>>>>>>>>>> > Bad news, it looks like my xcievers
is set as it should be, it's

>

> in

>

>>>>>>>>>>>> > the hdfs-site.xml and looking at the
job.xml of one of my jobs in

>

> the

>

>>>>>>>>>>>> > job-tracker, it's showing that property
as set to 2047. I've cat 

|

>

>>>>>>>>>>>> > grepped one of the datanode logs and
although there were a few in

>

>>>>>>>>>>>> > there, they were from a few months ago.
I've upped my 

MAX_FILESIZE

>

> on

>

>>>>>>>>>>>> > my table to 1GB to see if that helps
(not sure if it will!).

>

>>>>>>>>>>>> >

>

>>>>>>>>>>>> > Thanks,

>

>>>>>>>>>>>> >

>

>>>>>>>>>>>> > Jamie

>

>>>>>>>>>>>> >

>

>>>>>>>>>>>> > On 7 July 2010 18:12, Jean-Daniel Cryans
<jdcryans@apache.org>

>

> wrote:

>

>>>>>>>>>>>> >> xcievers exceptions will be in the
datanodes' logs, and your

>

> problem

>

>>>>>>>>>>>> >> totally looks like it. 0.20.5 will
have the same issue (since

>

> it's on

>

>>>>>>>>>>>> >> the HDFS side)

>

>>>>>>>>>>>> >>

>

>>>>>>>>>>>> >> J-D

>

>>>>>>>>>>>> >>

>

>>>>>>>>>>>> >> On Wed, Jul 7, 2010 at 10:08 AM,
Jamie Cockrill

>

>>>>>>>>>>>> >> <jamie.cockrill@gmail.com>
wrote:

>

>>>>>>>>>>>> >>> Hi Todd & JD,

>

>>>>>>>>>>>> >>>

>

>>>>>>>>>>>> >>> Environment:

>

>>>>>>>>>>>> >>> All (hadoop and HBase) installed
as of karmic-cdh3, which 

means:

>

>>>>>>>>>>>> >>> Hadoop 0.20.2+228

>

>>>>>>>>>>>> >>> HBase 0.89.20100621+17

>

>>>>>>>>>>>> >>> Zookeeper 3.3.1+7

>

>>>>>>>>>>>> >>>

>

>>>>>>>>>>>> >>> Unfortunately my whole cluster
of regionservers have now

>

> crashed, so I

>

>>>>>>>>>>>> >>> can't really say if it was swapping
too much. There is a DEBUG

>

>>>>>>>>>>>> >>> statement just before it crashes
saying:

>

>>>>>>>>>>>> >>>

>

>>>>>>>>>>>> >>> org.apache.hadoop.hbase.regionserver.wal.HLog:
closing hlog

>

> writer in

>

>>>>>>>>>>>> >>> hdfs://<somewhere on my HDFS,
in /hbase>

>

>>>>>>>>>>>> >>>

>

>>>>>>>>>>>> >>> What follows is:

>

>>>>>>>>>>>> >>>

>

>>>>>>>>>>>> >>> WARN org.apache.hadoop.hdfs.DFSClient:
DataStreamer Exception:

>

>>>>>>>>>>>> >>> org.apache.hadoop.ipc.RemoteException:

>

>>>>>>>>>>>> >>> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:


No

>

> lease

>

>>>>>>>>>>>> >>> on <file location as above>
File does not exist. Holder

>

>>>>>>>>>>>> >>> DFSClient_-11113603 does not
have any open files

>

>>>>>>>>>>>> >>>

>

>>>>>>>>>>>> >>> It then seems to try and do
some error recovery (Error Recovery

>

> for

>

>>>>>>>>>>>> >>> block null bad datanode[0] nodes
== null), fails (Could not get

>

> block

>

>>>>>>>>>>>> >>> locations. Source file "<hbase
file as before>" - Aborting).

>

> There is

>

>>>>>>>>>>>> >>> then an ERROR org.apache...HRegionServer:
Close and delete

>

> failed.

>

>>>>>>>>>>>> >>> There is then a similar LeaseExpiredException
as above.

>

>>>>>>>>>>>> >>>

>

>>>>>>>>>>>> >>> There are then a couple of messages
from HRegionServer saying

>

> that

>

>>>>>>>>>>>> >>> it's notifying master of its
shutdown and stopping itself. The

>

>>>>>>>>>>>> >>> shutdown hook then fires and
the RemoteException and

>

>>>>>>>>>>>> >>> LeaseExpiredExceptions are printed
again.

>

>>>>>>>>>>>> >>>

>

>>>>>>>>>>>> >>> ulimit is set to 65000 (it's
in the regionserver log, printed 

as

>

> I

>

>>>>>>>>>>>> >>> restarted the regionserver),
however I haven't got the xceivers

>

> set

>

>>>>>>>>>>>> >>> anywhere. I'll give that a go.
It does seem very odd as I did

>

> have a

>

>>>>>>>>>>>> >>> few of them fall over one at
a time with a few early loads, but

>

> that

>

>>>>>>>>>>>> >>> seemed to be because the regions
weren't splitting properly, so

>

> all

>

>>>>>>>>>>>> >>> the traffic was going to one
node and it was being overwhelmed.

>

> Once I

>

>>>>>>>>>>>> >>> throttled it, after one load
it a region split seemed to get

>

>>>>>>>>>>>> >>> triggered, which flung regions
all over, which made subsequent

>

> loads

>

>>>>>>>>>>>> >>> much more distributed. However,
perhaps the time-bomb was

>

> ticking...

>

>>>>>>>>>>>> >>> I'll  have a go at specifying
the xcievers property. I'm pretty

>

>>>>>>>>>>>> >>> certain i've got everything
else covered, except the patches as

>

>>>>>>>>>>>> >>> referenced in the JIRA.

>

>>>>>>>>>>>> >>>

>

>>>>>>>>>>>> >>> I just grepped some of the log
files and didn't get an explicit

>

>>>>>>>>>>>> >>> exception with 'xciever' in
it.

>

>>>>>>>>>>>> >>>

>

>>>>>>>>>>>> >>> I am considering downgrading(?)
to 0.20.5, however because

>

> everything

>

>>>>>>>>>>>> >>> is installed as per karmic-cdh3,
I'm a bit reluctant to do so 

as

>

>>>>>>>>>>>> >>> presumably Cloudera has tested
each of these versions against

>

> each

>

>>>>>>>>>>>> >>> other? And I don't really want
to introduce further versioning

>

> issues.

>

>>>>>>>>>>>> >>>

>

>>>>>>>>>>>> >>> Thanks,

>

>>>>>>>>>>>> >>>

>

>>>>>>>>>>>> >>> Jamie

>

>>>>>>>>>>>> >>>

>

>>>>>>>>>>>> >>>

>

>>>>>>>>>>>> >>> On 7 July 2010 17:30, Jean-Daniel
Cryans <jdcryans@apache.org>

>

> wrote:

>

>>>>>>>>>>>> >>>> Jamie,

>

>>>>>>>>>>>> >>>>

>

>>>>>>>>>>>> >>>> Does your configuration
meets the requirements?

>

>>>>>>>>>>>> >>>>

>

>>>>>>>>>>>> http://hbase.apache.org/docs/r0.20.5/api/overview-summary.html#requirements

>

>>>>>>>>>>>> >>>>

>

>>>>>>>>>>>> >>>> ulimit and xcievers, if
not set, are usually time bombs that

>

> blow off

>

>>>>>>>>>>>> when

>

>>>>>>>>>>>> >>>> the cluster is under load.

>

>>>>>>>>>>>> >>>>

>

>>>>>>>>>>>> >>>> J-D

>

>>>>>>>>>>>> >>>>

>

>>>>>>>>>>>> >>>> On Wed, Jul 7, 2010 at 9:11
AM, Jamie Cockrill <

>

>>>>>>>>>>>> jamie.cockrill@gmail.com>wrote:

>

>>>>>>>>>>>> >>>>

>

>>>>>>>>>>>> >>>>> Dear all,

>

>>>>>>>>>>>> >>>>>

>

>>>>>>>>>>>> >>>>> My current HBase/Hadoop
architecture has HBase region servers

>

> on the

>

>>>>>>>>>>>> >>>>> same physical boxes
as the HDFS data-nodes. I'm getting an

>

> awful lot

>

>>>>>>>>>>>> >>>>> of region server crashes.
The last thing that happens appears

>

> to be a

>

>>>>>>>>>>>> >>>>> DroppedSnapshot Exception,
caused by an IOException: could 

not

>

>>>>>>>>>>>> >>>>> complete write to file
<file on HDFS>. I am running it under

>

> load,

>

>>>>>>>>>>>> how

>

>>>>>>>>>>>> >>>>> heavy that is I'm not
sure how that is quantified, but I'm

>

> guessing

>

>>>>>>>>>>>> it

>

>>>>>>>>>>>> >>>>> is a load issue.

>

>>>>>>>>>>>> >>>>>

>

>>>>>>>>>>>> >>>>> Is it common practice
to put region servers on data-nodes? Is

>

> it

>

>>>>>>>>>>>> >>>>> common to see region
server crashes when either the HDFS or

>

> region

>

>>>>>>>>>>>> >>>>> server (or both) is
under heavy load? I'm guessing that is 

the

>

> case

>

>>>>>>>>>>>> as

>

>>>>>>>>>>>> >>>>> I've seen a few similar
posts. I've not got a great deal of

>

> capacity

>

>>>>>>>>>>>> >>>>> to be separating region
servers from HDFS data nodes, but it

>

> might be

>

>>>>>>>>>>>> >>>>> an argument I could
make.

>

>>>>>>>>>>>> >>>>>

>

>>>>>>>>>>>> >>>>> Thanks

>

>>>>>>>>>>>> >>>>>

>

>>>>>>>>>>>> >>>>> Jamie

>

>>>>>>>>>>>> >>>>>

>

>>>>>>>>>>>> >>>>

>

>>>>>>>>>>>> >>>

>

>>>>>>>>>>>> >>

>

>>>>>>>>>>>> >

>

>>>>>>>>>>>>

>

>>>>>>>>>>>

>

>>>>>>>>>>>

>

>>>>>>>>>>>

>

>>>>>>>>>>> --

>

>>>>>>>>>>> Todd Lipcon

>

>>>>>>>>>>> Software Engineer, Cloudera

>

>>>>>>>>>>>

>

>>>>>>>>>>

>

>>>>>>>>>

>

>>>>>>>>

>

>>>>>>>

>

>>>>>>

>

>>>>>

>

>>>>

>

>>>

>

>>

>

>

>

>


 

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message