hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Vigeant <mark.vige...@riskmetrics.com>
Subject RE: Table Upload Optimization
Date Wed, 21 Oct 2009 18:29:25 GMT
No, they are all running on separate hosts. What I described is the specs for each node.

I have 4 VMs total, 2 running per 4 core machine. The machines are doing nothing else.

How do I check for swapping?

-----Original Message-----
From: Jonathan Gray [mailto:jlist@streamy.com] 
Sent: Wednesday, October 21, 2009 1:30 PM
To: hbase-user@hadoop.apache.org
Subject: Re: Table Upload Optimization

You are running all of these virtual machines on a single host node? 
And they are all sharing 4GB of memory?

That is a major issue.  First, GC pauses will start to lock things up 
and create time outs.  Then swapping will totally kill performance of 
everything.  Is that happening on your cluster?

Virtualized clusters have some odd performance characteristics, and if 
you are starving each virtual node as it is, then you will never see 
solid behavior.  Virtualized IO can also be problematic, if not just 
slow (most especially during upload scenarios).

JG

Mark Vigeant wrote:
>> I saw this in your first posting: 10/21/09 10:22:52 INFO mapred.JobClient:
>> map 100% reduce 0%.
> 
>> Is your job writing hbase in the map task or in reducer?  Are you using
>> TableOutputFormat?
> 
> I am using table output format and only a mapper. There is no reducer. Would a reducer
make things more efficient?
> 
> 
>>> I'm using Hadoop 0.20.1 and HBase 0.20.0
>>>
>>> Each node is a virtual machine with 2 CPU, 4 GB host memory and 100 GB
>>> storage.
>>>
>>>
>> You are running DN, TT, HBase, and ZK on above?  One disk shared by all?
> 
> I'm only running zookeeper on 2 of the above nodes, and then a TT DN and regionserver
on all. 
> 
>> Children running at any one time on a TaskTracker.  You should start with
>> one only since you have such an anemic platform.
> 
> Ah, and I can set that in the hadoop config?
> 
> 
>> You've upped filedescriptors and xceivers, all the stuff in 'Getting
>> Started'?
> 
> And no it appears as though I accidentally overlooked that beginning stuff. Yikes. Ok.

> 
> I will take care of those and get back to you.
> 
> 
>> -----Original Message-----
>> From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of
>> Jean-Daniel Cryans
>> Sent: Wednesday, October 21, 2009 11:04 AM
>> To: hbase-user@hadoop.apache.org
>> Subject: Re: Table Upload Optimization
>>
>> Well the XMLStreamingInputFormat lets you map XML files which is neat
>> but it has a problem and always needs to be patched. I wondered if
>> that was missing but in your case it's not the problem.
>>
>> Did you check the logs of the master and region servers? Also I'd like to
>> know
>>
>> - Version of Hadoop and HBase
>> - Nodes's hardware
>> - How many map slots per TT
>> - HBASE_HEAPSIZE from conf/hbase-env.sh
>> - Special configuration you use
>>
>> Thx,
>>
>> J-D
>>
>> On Wed, Oct 21, 2009 at 7:57 AM, Mark Vigeant
>> <mark.vigeant@riskmetrics.com> wrote:
>>> No. Should I?
>>>
>>> -----Original Message-----
>>> From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of
>> Jean-Daniel Cryans
>>> Sent: Wednesday, October 21, 2009 10:55 AM
>>> To: hbase-user@hadoop.apache.org
>>> Subject: Re: Table Upload Optimization
>>>
>>> Are you using the Hadoop Streaming API?
>>>
>>> J-D
>>>
>>> On Wed, Oct 21, 2009 at 7:52 AM, Mark Vigeant
>>> <mark.vigeant@riskmetrics.com> wrote:
>>>> Hey
>>>>
>>>> So I want to upload a lot of XML data into an HTable. I have a class
>> that successfully maps up to about 500 MB of data or so (on one
>> regionserver) into a table, but if I go for much bigger than that it takes
>> forever and eventually just stops. I tried uploading a big XML file into my
>> 4 regionserver cluster (about 7 GB) and it's been a day and it's still going
>> at it.
>>>> What I get when I run the job on the 4 node cluster is:
>>>> 10/21/09 10:22:35 INFO mapred.LocalJobRunner:
>>>> 10/21/09 10:22:38 INFO mapred.LocalJobRunner:
>>>> (then it does that for a while until...)
>>>> 10/21/09 10:22:52 INFO mapred.TaskRunner: Task
>> attempt_local_0001_m_000117_0 is done. And is in the process of committing
>>>> 10/21/09 10:22:52 INFO mapred.LocalJobRunner:
>>>> 10/21/09 10:22:52 mapred.TaskRunner: Task
>> 'attempt_local_0001_m_000117_0' is done.
>>>> 10/21/09 10:22:52 INFO mapred.JobClient:   map 100% reduce 0%
>>>> 10/21/09 10:22:58 INFO mapred.LocalJobRunner:
>>>> 10/21/09 10:22:59 INFO mapred.JobClient: map 99% reduce 0%
>>>>
>>>>
>>>> I'm convinced I'm not configuring hbase or hadoop correctly. Any
>> suggestions?
>>>> Mark Vigeant
>>>> RiskMetrics Group, Inc.
>>>>
> 

Mime
View raw message