hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jamie Cockrill <jamie.cockr...@gmail.com>
Subject Re: HBase on same boxes as HDFS Data nodes
Date Thu, 08 Jul 2010 22:24:08 GMT
Hi Venkatesh,

I've had a read of the article that JD suggested and I think the
following is useful for your situation (it's the paragraph before the
last):

"So this means for HBase that as the region server stays up for long
enough (which is the default) that after a major compaction on all
tables - which can be invoked manually or is triggered by a
configuration setting - it has the files local on the same host. The
data node that shares the same physical host has a copy of all data
the region server requires. If you are running a scan or get or any
other use-case you can be sure to get the best performance."

If I understand this correctly, if you do a major compaction on the
table, it'll check that the data blocks and regions are co-located
properly and move any that are not. You can do a major compact in the
hbase shell manually with:

major_compact '<your tablename here>'

Without the angle brackets, but with the parentheses. I think it does
do major compacts on a periodic basis, but I couldn't say for sure.

Thanks,

Jamie

PS, JD, thanks for the tip about locality. I'll get the hang of HBase
some point!

On 8 July 2010 18:51,  <vramanathan00@aol.com> wrote:
>
>  Thankyou..
> I've some more questions
> I'm spending quite a bit over last few weeks to develop one of our applications using
HBase/Hadoop
> & using 0.20.4
>
> Hbase - Table X
> rows - 1- 100 -> Region A -> RegionServer A     --> DataNode A
> ....
> rows  1500 - 1600 -> Region M -> RegionServer B -> DataNode B
>
> So based on what I have read so far..I'm thinking of Region Server A & Data Node
A pairs on the same host to
> make use of locality..
>
> As per your answer ..If we restart the cluster, because of radom assigment, locality
is gone
> so..Region Server B -..> Region A ---> data blocks will be in Data Node A
> ...if I understand correctly..
> will the data move over time though...for example if i have lots of access to data in
DataNode A ? without the current work that is in progress..
>
> thanks again for your reply
>
> venkatesh
>
>
>
>
>
>
>
>
> -----Original Message-----
> From: Jean-Daniel Cryans <jdcryans@apache.org>
> To: user@hbase.apache.org
> Sent: Thu, Jul 8, 2010 1:35 pm
> Subject: Re: HBase on same boxes as HDFS Data nodes
>
>
> Former, " Now imagine you stop HBase after saving a lot of data and
>
> restarting it subsequently. The region servers are restarted and
>
> assign a seemingly random number of regions"
>
>
>
> It's not really because we enjoy it that way, but because the work
>
> required just isn't done. If this is of interest to you, Jonathan and
>
> Karthik at Facebook started rewriting our load balancer. See
>
> https://issues.apache.org/jira/browse/HBASE-2699 and
>
> https://issues.apache.org/jira/browse/HBASE-2480
>
>
>
> J-D
>
>
>
> On Thu, Jul 8, 2010 at 10:30 AM,  <vramanathan00@aol.com> wrote:
>
>>
>
>>  Hi
>
>> Fairly new to hbase..& the list serve..Following up on this thread & the
>
> article..
>
>> Could some one elaborate why locality is lost upon restart? Is it because
>
>> of random assignment by HMaster and/or HRegionServer is stateless or other
>
> reasons?
>
>>
>
>> thanks
>
>> venkatesh
>
>>
>
>>
>
>>
>
>>
>
>>
>
>>
>
>>
>
>>
>
>>
>
>>
>
>> -----Original Message-----
>
>> From: Jean-Daniel Cryans <jdcryans@apache.org>
>
>> To: user@hbase.apache.org
>
>> Sent: Thu, Jul 8, 2010 1:11 pm
>
>> Subject: Re: HBase on same boxes as HDFS Data nodes
>
>>
>
>>
>
>> More info on this blog post:
>
>>
>
>> http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html
>
>>
>
>>
>
>>
>
>> J-D
>
>>
>
>>
>
>>
>
>> On Thu, Jul 8, 2010 at 10:11 AM, Jean-Daniel Cryans <jdcryans@apache.org>
>
> wrote:
>
>>
>
>>> This would be done at the expense of network IO, since you will lose
>
>>
>
>>> locality for jobs that read/write to HBase. Also I guess the datanodes
>
>>
>
>>> are also there, so HBase will lose locality with HDFS.
>
>>
>
>>>
>
>>
>
>>> J-D
>
>>
>
>>>
>
>>
>
>>> On Thu, Jul 8, 2010 at 10:07 AM, Jamie Cockrill
>
>>
>
>>> <jamie.cockrill@gmail.com> wrote:
>
>>
>
>>>> Thanks all for your help with this, everything seems much more stable
>
>>
>
>>>> for the meantime. I have a backlog loading job to run over a great
>
>>
>
>>>> deal of data, so I might separate out my region servers from my task
>
>>
>
>>>> trackers for the meantime.
>
>>
>
>>>>
>
>>
>
>>>> Thanks again,
>
>>
>
>>>>
>
>>
>
>>>> Jamie
>
>>
>
>>>>
>
>>
>
>>>>
>
>>
>
>>>>
>
>>
>
>>>> On 8 July 2010 17:46, Jean-Daniel Cryans <jdcryans@apache.org> wrote:
>
>>
>
>>>>> OS cache is good, glad you figured out your memory problem.
>
>>
>
>>>>>
>
>>
>
>>>>> J-D
>
>>
>
>>>>>
>
>>
>
>>>>> On Thu, Jul 8, 2010 at 2:03 AM, Jamie Cockrill <jamie.cockrill@gmail.com>
>
>>
>
>> wrote:
>
>>
>
>>>>>> Morning all. Day 2 begins...
>
>>
>
>>>>>>
>
>>
>
>>>>>> I discussed this with someone else earlier and they pointed out that
>
>>
>
>>>>>> we also have task trackers running on all of those nodes, which will
>
>>
>
>>>>>> affect the amount of memory being used when jobs are being run. Each
>
>>
>
>>>>>> tasktracker had a maximum of 8 maps and 8 reduces configured per
node,
>
>>
>
>>>>>> with a JVM Xmx of 512mb each.  Clearly this implies a fully utilised
>
>>
>
>>>>>> node will use 8*512mb + 8*512mb = 8GB of memory on tasks alone. That's
>
>>
>
>>>>>> before the datanode does anything, or HBase for that matter.
>
>>
>
>>>>>>
>
>>
>
>>>>>> As such, I've dropped it to 4 maps, 4 reduces per node and reduced
the
>
>>
>
>>>>>> Xmx to 256mb, giving a potential maximum task overhead of 2GB per
>
>>
>
>>>>>> node. Running 'vmstat 20' now, under load from mapreduce jobs,
>
>>
>
>>>>>> suggests that the actual free memory is about the same, but the memory
>
>>
>
>>>>>> cache is much much bigger, which presumably is healthlier as, in
>
>>
>
>>>>>> theory, that ought to relinquish memory to processes that request
it.
>
>>
>
>>>>>>
>
>>
>
>>>>>> Lets see if that does the trick!
>
>>
>
>>>>>>
>
>>
>
>>>>>> ta
>
>>
>
>>>>>>
>
>>
>
>>>>>> Jamie
>
>>
>
>>>>>>
>
>>
>
>>>>>>
>
>>
>
>>>>>> On 7 July 2010 19:30, Jean-Daniel Cryans <jdcryans@apache.org>
wrote:
>
>>
>
>>>>>>> YouAreDead means that the region server's session was expired,
GC
>
>>
>
>>>>>>> seems like your major problem. (file problems can happen after
a GC
>
>>
>
>>>>>>> sleep because they were moved around while the process was sleeping,
>
>>
>
>>>>>>> you also get the same kind of messages with xcievers issue...
sorry
>
>>
>
>>>>>>> for the confusion)
>
>>
>
>>>>>>>
>
>>
>
>>>>>>> By over committing the memory I meant trying to fit too much
stuff in
>
>>
>
>>>>>>> the amount of RAM that you have. I guess it's the map and reduce
tasks
>
>>
>
>>>>>>> that eat all the free space? Why not lower their number?
>
>>
>
>>>>>>>
>
>>
>
>>>>>>> J-D
>
>>
>
>>>>>>>
>
>>
>
>>>>>>> On Wed, Jul 7, 2010 at 11:22 AM, Jamie Cockrill
>
>>
>
>>>>>>> <jamie.cockrill@gmail.com> wrote:
>
>>
>
>>>>>>>> PS, I've now reset my MAX_FILESIZE back to the default.  (from
the 1GB
>
>>
>
>>>>>>>> i raised it to). It caused me to run into a delightful
>
>>
>
>>>>>>>> 'YouAreDeadException' which looks very related to the Garbage
>
>>
>
>>>>>>>> collection issues on the Troubleshooting page, as my Zookeeper
session
>
>>
>
>>>>>>>> expired.
>
>>
>
>>>>>>>>
>
>>
>
>>>>>>>> Thanks
>
>>
>
>>>>>>>>
>
>>
>
>>>>>>>> Jamie
>
>>
>
>>>>>>>>
>
>>
>
>>>>>>>>
>
>>
>
>>>>>>>>
>
>>
>
>>>>>>>> On 7 July 2010 19:19, Jamie Cockrill <jamie.cockrill@gmail.com>
wrote:
>
>>
>
>>>>>>>>> By overcommit, do you mean make my overcommit_ratio higher
on each box
>
>>
>
>>>>>>>>> (its at the default 50 at the moment)? What I'm noticing
at the moment
>
>>
>
>>>>>>>>> is that hadoop is taking up the vast majority of the
memory on the
>
>>
>
>>>>>>>>> boxes.
>
>>
>
>>>>>>>>>
>
>>
>
>>>>>>>>> I found this article:
>
>>
>
>>>>>>>>> http://blog.rapleaf.com/dev/2010/01/05/the-wrath-of-drwho-or-unpredictable-hadoop-memory-usage/
>
>>
>
>>>>>>>>> which Todd, it looks like you replied to. Does this sound
like a
>
>>
>
>>>>>>>>> similar problem? No worries if you can't remember, it
was back in
>
>>
>
>>>>>>>>> january! This article suggests reducing the amount of
memory allocated
>
>>
>
>>>>>>>>> to Hadoop at startup, how would I go about doing this?
>
>>
>
>>>>>>>>>
>
>>
>
>>>>>>>>> Thank you everyone for your patience so far. Sorry if
this is taking
>
>>
>
>>>>>>>>> up a lot of your time.
>
>>
>
>>>>>>>>>
>
>>
>
>>>>>>>>> Thanks,
>
>>
>
>>>>>>>>>
>
>>
>
>>>>>>>>> Jamie
>
>>
>
>>>>>>>>>
>
>>
>
>>>>>>>>> On 7 July 2010 19:03, Jean-Daniel Cryans <jdcryans@apache.org>
wrote:
>
>>
>
>>>>>>>>>> swappinness at 0 is good, but also don't overcommit
your memory!
>
>>
>
>>>>>>>>>>
>
>>
>
>>>>>>>>>> J-D
>
>>
>
>>>>>>>>>>
>
>>
>
>>>>>>>>>> On Wed, Jul 7, 2010 at 10:53 AM, Jamie Cockrill
>
>>
>
>>>>>>>>>> <jamie.cockrill@gmail.com> wrote:
>
>>
>
>>>>>>>>>>> I think you're right.
>
>>
>
>>>>>>>>>>>
>
>>
>
>>>>>>>>>>> Unfortunately the machines are on a separate
network to this laptop,
>
>>
>
>>>>>>>>>>> so I'm having to type everything across, apologies
if it doesn't
>
>>
>
>>>>>>>>>>> translate well...
>
>>
>
>>>>>>>>>>>
>
>>
>
>>>>>>>>>>> free -m gave:
>
>>
>
>>>>>>>>>>>
>
>>
>
>>>>>>>>>>> Mem    Total    Used     Free
>
>>
>
>>>>>>>>>>>            7992     7939      53
>
>>
>
>>>>>>>>>>> b/c                    7877    114
>
>>
>
>>>>>>>>>>> Swap: 23415       895  22519
>
>>
>
>>>>>>>>>>>
>
>>
>
>>>>>>>>>>> I did this on another node that isn't being smashed
at the moment and
>
>>
>
>>>>>>>>>>> the numbers came out similar, but the buffers/cache
free was higher
>
>>
>
>>>>>>>>>>>
>
>>
>
>>>>>>>>>>> vmstat -20 is giving non-zero si and so's ranging
between 3 and just
>
>>
>
>>>>>>>>>>> short of 5000.
>
>>
>
>>>>>>>>>>>
>
>>
>
>>>>>>>>>>> That seems to be it I guess. Hadoop troubleshooting
suggests setting
>
>>
>
>>>>>>>>>>> swappiness to 0, is that just a case of changing
the value in
>
>>
>
>>>>>>>>>>> /proc/sys/vm/swappiness?
>
>>
>
>>>>>>>>>>>
>
>>
>
>>>>>>>>>>> thanks
>
>>
>
>>>>>>>>>>>
>
>>
>
>>>>>>>>>>> Jamie
>
>>
>
>>>>>>>>>>>
>
>>
>
>>>>>>>>>>>
>
>>
>
>>>>>>>>>>>
>
>>
>
>>>>>>>>>>>
>
>>
>
>>>>>>>>>>> On 7 July 2010 18:40, Todd Lipcon <todd@cloudera.com>
wrote:
>
>>
>
>>>>>>>>>>>> On Wed, Jul 7, 2010 at 10:32 AM, Jamie Cockrill
<jamie.cockrill@gmail.com>wrote:
>
>>
>
>>>>>>>>>>>>
>
>>
>
>>>>>>>>>>>>> On the subject of GC and heap, I've left
those as defaults. I could
>
>>
>
>>>>>>>>>>>>> look at those if that's the next logical
step? Would there be
>
>>
>
>> anything
>
>>
>
>>>>>>>>>>>>> in any of the logs that I should look
at?
>
>>
>
>>>>>>>>>>>>>
>
>>
>
>>>>>>>>>>>>> One thing I have noticed is that it does
take an absolute age to
>
> log
>
>>
>
>>>>>>>>>>>>> in to the DN/RS to restart the RS once
it's fallen over, in one
>
>>
>
>>>>>>>>>>>>> instance it took about 10 minutes. These
are 8GB, 4 core amd64
>
> boxes
>
>>
>
>>>>>>>>>>>>>
>
>>
>
>>>>>>>>>>>>>
>
>>
>
>>>>>>>>>>>> That indicates swapping. Can you run "free
-m" on the node?
>
>>
>
>>>>>>>>>>>>
>
>>
>
>>>>>>>>>>>> Also let "vmstat 20" run while running your
job and observe the "si"
>
>>
>
>> and
>
>>
>
>>>>>>>>>>>> "so" columns. If those are nonzero, it indicates
you're swapping,
>
> and
>
>>
>
>> you've
>
>>
>
>>>>>>>>>>>> oversubscribed your RAM (very easy on 8G
machines)
>
>>
>
>>>>>>>>>>>>
>
>>
>
>>>>>>>>>>>> -Todd
>
>>
>
>>>>>>>>>>>>
>
>>
>
>>>>>>>>>>>>
>
>>
>
>>>>>>>>>>>>
>
>>
>
>>>>>>>>>>>>> ta
>
>>
>
>>>>>>>>>>>>>
>
>>
>
>>>>>>>>>>>>> Jamie
>
>>
>
>>>>>>>>>>>>>
>
>>
>
>>>>>>>>>>>>>
>
>>
>
>>>>>>>>>>>>>
>
>>
>
>>>>>>>>>>>>> On 7 July 2010 18:30, Jamie Cockrill
<jamie.cockrill@gmail.com>
>
>>
>
>> wrote:
>
>>
>
>>>>>>>>>>>>> > Bad news, it looks like my xcievers
is set as it should be, it's
>
>>
>
>> in
>
>>
>
>>>>>>>>>>>>> > the hdfs-site.xml and looking at
the job.xml of one of my jobs in
>
>>
>
>> the
>
>>
>
>>>>>>>>>>>>> > job-tracker, it's showing that property
as set to 2047. I've cat
>
> |
>
>>
>
>>>>>>>>>>>>> > grepped one of the datanode logs
and although there were a few in
>
>>
>
>>>>>>>>>>>>> > there, they were from a few months
ago. I've upped my
>
> MAX_FILESIZE
>
>>
>
>> on
>
>>
>
>>>>>>>>>>>>> > my table to 1GB to see if that helps
(not sure if it will!).
>
>>
>
>>>>>>>>>>>>> >
>
>>
>
>>>>>>>>>>>>> > Thanks,
>
>>
>
>>>>>>>>>>>>> >
>
>>
>
>>>>>>>>>>>>> > Jamie
>
>>
>
>>>>>>>>>>>>> >
>
>>
>
>>>>>>>>>>>>> > On 7 July 2010 18:12, Jean-Daniel
Cryans <jdcryans@apache.org>
>
>>
>
>> wrote:
>
>>
>
>>>>>>>>>>>>> >> xcievers exceptions will be
in the datanodes' logs, and your
>
>>
>
>> problem
>
>>
>
>>>>>>>>>>>>> >> totally looks like it. 0.20.5
will have the same issue (since
>
>>
>
>> it's on
>
>>
>
>>>>>>>>>>>>> >> the HDFS side)
>
>>
>
>>>>>>>>>>>>> >>
>
>>
>
>>>>>>>>>>>>> >> J-D
>
>>
>
>>>>>>>>>>>>> >>
>
>>
>
>>>>>>>>>>>>> >> On Wed, Jul 7, 2010 at 10:08
AM, Jamie Cockrill
>
>>
>
>>>>>>>>>>>>> >> <jamie.cockrill@gmail.com>
wrote:
>
>>
>
>>>>>>>>>>>>> >>> Hi Todd & JD,
>
>>
>
>>>>>>>>>>>>> >>>
>
>>
>
>>>>>>>>>>>>> >>> Environment:
>
>>
>
>>>>>>>>>>>>> >>> All (hadoop and HBase) installed
as of karmic-cdh3, which
>
> means:
>
>>
>
>>>>>>>>>>>>> >>> Hadoop 0.20.2+228
>
>>
>
>>>>>>>>>>>>> >>> HBase 0.89.20100621+17
>
>>
>
>>>>>>>>>>>>> >>> Zookeeper 3.3.1+7
>
>>
>
>>>>>>>>>>>>> >>>
>
>>
>
>>>>>>>>>>>>> >>> Unfortunately my whole cluster
of regionservers have now
>
>>
>
>> crashed, so I
>
>>
>
>>>>>>>>>>>>> >>> can't really say if it was
swapping too much. There is a DEBUG
>
>>
>
>>>>>>>>>>>>> >>> statement just before it
crashes saying:
>
>>
>
>>>>>>>>>>>>> >>>
>
>>
>
>>>>>>>>>>>>> >>> org.apache.hadoop.hbase.regionserver.wal.HLog:
closing hlog
>
>>
>
>> writer in
>
>>
>
>>>>>>>>>>>>> >>> hdfs://<somewhere on
my HDFS, in /hbase>
>
>>
>
>>>>>>>>>>>>> >>>
>
>>
>
>>>>>>>>>>>>> >>> What follows is:
>
>>
>
>>>>>>>>>>>>> >>>
>
>>
>
>>>>>>>>>>>>> >>> WARN org.apache.hadoop.hdfs.DFSClient:
DataStreamer Exception:
>
>>
>
>>>>>>>>>>>>> >>> org.apache.hadoop.ipc.RemoteException:
>
>>
>
>>>>>>>>>>>>> >>> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
>
> No
>
>>
>
>> lease
>
>>
>
>>>>>>>>>>>>> >>> on <file location as
above> File does not exist. Holder
>
>>
>
>>>>>>>>>>>>> >>> DFSClient_-11113603 does
not have any open files
>
>>
>
>>>>>>>>>>>>> >>>
>
>>
>
>>>>>>>>>>>>> >>> It then seems to try and
do some error recovery (Error Recovery
>
>>
>
>> for
>
>>
>
>>>>>>>>>>>>> >>> block null bad datanode[0]
nodes == null), fails (Could not get
>
>>
>
>> block
>
>>
>
>>>>>>>>>>>>> >>> locations. Source file "<hbase
file as before>" - Aborting).
>
>>
>
>> There is
>
>>
>
>>>>>>>>>>>>> >>> then an ERROR org.apache...HRegionServer:
Close and delete
>
>>
>
>> failed.
>
>>
>
>>>>>>>>>>>>> >>> There is then a similar
LeaseExpiredException as above.
>
>>
>
>>>>>>>>>>>>> >>>
>
>>
>
>>>>>>>>>>>>> >>> There are then a couple
of messages from HRegionServer saying
>
>>
>
>> that
>
>>
>
>>>>>>>>>>>>> >>> it's notifying master of
its shutdown and stopping itself. The
>
>>
>
>>>>>>>>>>>>> >>> shutdown hook then fires
and the RemoteException and
>
>>
>
>>>>>>>>>>>>> >>> LeaseExpiredExceptions are
printed again.
>
>>
>
>>>>>>>>>>>>> >>>
>
>>
>
>>>>>>>>>>>>> >>> ulimit is set to 65000 (it's
in the regionserver log, printed
>
> as
>
>>
>
>> I
>
>>
>
>>>>>>>>>>>>> >>> restarted the regionserver),
however I haven't got the xceivers
>
>>
>
>> set
>
>>
>
>>>>>>>>>>>>> >>> anywhere. I'll give that
a go. It does seem very odd as I did
>
>>
>
>> have a
>
>>
>
>>>>>>>>>>>>> >>> few of them fall over one
at a time with a few early loads, but
>
>>
>
>> that
>
>>
>
>>>>>>>>>>>>> >>> seemed to be because the
regions weren't splitting properly, so
>
>>
>
>> all
>
>>
>
>>>>>>>>>>>>> >>> the traffic was going to
one node and it was being overwhelmed.
>
>>
>
>> Once I
>
>>
>
>>>>>>>>>>>>> >>> throttled it, after one
load it a region split seemed to get
>
>>
>
>>>>>>>>>>>>> >>> triggered, which flung regions
all over, which made subsequent
>
>>
>
>> loads
>
>>
>
>>>>>>>>>>>>> >>> much more distributed. However,
perhaps the time-bomb was
>
>>
>
>> ticking...
>
>>
>
>>>>>>>>>>>>> >>> I'll  have a go at specifying
the xcievers property. I'm pretty
>
>>
>
>>>>>>>>>>>>> >>> certain i've got everything
else covered, except the patches as
>
>>
>
>>>>>>>>>>>>> >>> referenced in the JIRA.
>
>>
>
>>>>>>>>>>>>> >>>
>
>>
>
>>>>>>>>>>>>> >>> I just grepped some of the
log files and didn't get an explicit
>
>>
>
>>>>>>>>>>>>> >>> exception with 'xciever'
in it.
>
>>
>
>>>>>>>>>>>>> >>>
>
>>
>
>>>>>>>>>>>>> >>> I am considering downgrading(?)
to 0.20.5, however because
>
>>
>
>> everything
>
>>
>
>>>>>>>>>>>>> >>> is installed as per karmic-cdh3,
I'm a bit reluctant to do so
>
> as
>
>>
>
>>>>>>>>>>>>> >>> presumably Cloudera has
tested each of these versions against
>
>>
>
>> each
>
>>
>
>>>>>>>>>>>>> >>> other? And I don't really
want to introduce further versioning
>
>>
>
>> issues.
>
>>
>
>>>>>>>>>>>>> >>>
>
>>
>
>>>>>>>>>>>>> >>> Thanks,
>
>>
>
>>>>>>>>>>>>> >>>
>
>>
>
>>>>>>>>>>>>> >>> Jamie
>
>>
>
>>>>>>>>>>>>> >>>
>
>>
>
>>>>>>>>>>>>> >>>
>
>>
>
>>>>>>>>>>>>> >>> On 7 July 2010 17:30, Jean-Daniel
Cryans <jdcryans@apache.org>
>
>>
>
>> wrote:
>
>>
>
>>>>>>>>>>>>> >>>> Jamie,
>
>>
>
>>>>>>>>>>>>> >>>>
>
>>
>
>>>>>>>>>>>>> >>>> Does your configuration
meets the requirements?
>
>>
>
>>>>>>>>>>>>> >>>>
>
>>
>
>>>>>>>>>>>>> http://hbase.apache.org/docs/r0.20.5/api/overview-summary.html#requirements
>
>>
>
>>>>>>>>>>>>> >>>>
>
>>
>
>>>>>>>>>>>>> >>>> ulimit and xcievers,
if not set, are usually time bombs that
>
>>
>
>> blow off
>
>>
>
>>>>>>>>>>>>> when
>
>>
>
>>>>>>>>>>>>> >>>> the cluster is under
load.
>
>>
>
>>>>>>>>>>>>> >>>>
>
>>
>
>>>>>>>>>>>>> >>>> J-D
>
>>
>
>>>>>>>>>>>>> >>>>
>
>>
>
>>>>>>>>>>>>> >>>> On Wed, Jul 7, 2010
at 9:11 AM, Jamie Cockrill <
>
>>
>
>>>>>>>>>>>>> jamie.cockrill@gmail.com>wrote:
>
>>
>
>>>>>>>>>>>>> >>>>
>
>>
>
>>>>>>>>>>>>> >>>>> Dear all,
>
>>
>
>>>>>>>>>>>>> >>>>>
>
>>
>
>>>>>>>>>>>>> >>>>> My current HBase/Hadoop
architecture has HBase region servers
>
>>
>
>> on the
>
>>
>
>>>>>>>>>>>>> >>>>> same physical boxes
as the HDFS data-nodes. I'm getting an
>
>>
>
>> awful lot
>
>>
>
>>>>>>>>>>>>> >>>>> of region server
crashes. The last thing that happens appears
>
>>
>
>> to be a
>
>>
>
>>>>>>>>>>>>> >>>>> DroppedSnapshot
Exception, caused by an IOException: could
>
> not
>
>>
>
>>>>>>>>>>>>> >>>>> complete write to
file <file on HDFS>. I am running it under
>
>>
>
>> load,
>
>>
>
>>>>>>>>>>>>> how
>
>>
>
>>>>>>>>>>>>> >>>>> heavy that is I'm
not sure how that is quantified, but I'm
>
>>
>
>> guessing
>
>>
>
>>>>>>>>>>>>> it
>
>>
>
>>>>>>>>>>>>> >>>>> is a load issue.
>
>>
>
>>>>>>>>>>>>> >>>>>
>
>>
>
>>>>>>>>>>>>> >>>>> Is it common practice
to put region servers on data-nodes? Is
>
>>
>
>> it
>
>>
>
>>>>>>>>>>>>> >>>>> common to see region
server crashes when either the HDFS or
>
>>
>
>> region
>
>>
>
>>>>>>>>>>>>> >>>>> server (or both)
is under heavy load? I'm guessing that is
>
> the
>
>>
>
>> case
>
>>
>
>>>>>>>>>>>>> as
>
>>
>
>>>>>>>>>>>>> >>>>> I've seen a few
similar posts. I've not got a great deal of
>
>>
>
>> capacity
>
>>
>
>>>>>>>>>>>>> >>>>> to be separating
region servers from HDFS data nodes, but it
>
>>
>
>> might be
>
>>
>
>>>>>>>>>>>>> >>>>> an argument I could
make.
>
>>
>
>>>>>>>>>>>>> >>>>>
>
>>
>
>>>>>>>>>>>>> >>>>> Thanks
>
>>
>
>>>>>>>>>>>>> >>>>>
>
>>
>
>>>>>>>>>>>>> >>>>> Jamie
>
>>
>
>>>>>>>>>>>>> >>>>>
>
>>
>
>>>>>>>>>>>>> >>>>
>
>>
>
>>>>>>>>>>>>> >>>
>
>>
>
>>>>>>>>>>>>> >>
>
>>
>
>>>>>>>>>>>>> >
>
>>
>
>>>>>>>>>>>>>
>
>>
>
>>>>>>>>>>>>
>
>>
>
>>>>>>>>>>>>
>
>>
>
>>>>>>>>>>>>
>
>>
>
>>>>>>>>>>>> --
>
>>
>
>>>>>>>>>>>> Todd Lipcon
>
>>
>
>>>>>>>>>>>> Software Engineer, Cloudera
>
>>
>
>>>>>>>>>>>>
>
>>
>
>>>>>>>>>>>
>
>>
>
>>>>>>>>>>
>
>>
>
>>>>>>>>>
>
>>
>
>>>>>>>>
>
>>
>
>>>>>>>
>
>>
>
>>>>>>
>
>>
>
>>>>>
>
>>
>
>>>>
>
>>
>
>>>
>
>>
>
>>
>
>>
>
>>
>
>
>
>

Mime
View raw message