hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Vigeant <mark.vige...@riskmetrics.com>
Subject RE: Table Upload Optimization
Date Wed, 21 Oct 2009 15:22:26 GMT
Ok, so first in response to St. Ack, nothing fishy appears to be happening in the logs: data
is being written to all regionservesrs.

And it's not hovering around 100%  done, it just has sent about 118 map jobs, or "Task attempts"

I'm using Hadoop 0.20.1 and HBase 0.20.0

Each node is a virtual machine with 2 CPU, 4 GB host memory and 100 GB storage.

I don't know what you meant by slots per TT...

And the heapsize is the default of 1000 MB. That is probably a huge problem, now that I think
about it, heh.

And there is absolutely no special configuration that I'm using. I have Hbase running my zookeeper
quorum on 2 machines, but that's about it.

-----Original Message-----
From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of Jean-Daniel Cryans
Sent: Wednesday, October 21, 2009 11:04 AM
To: hbase-user@hadoop.apache.org
Subject: Re: Table Upload Optimization

Well the XMLStreamingInputFormat lets you map XML files which is neat
but it has a problem and always needs to be patched. I wondered if
that was missing but in your case it's not the problem.

Did you check the logs of the master and region servers? Also I'd like to know

- Version of Hadoop and HBase
- Nodes's hardware
- How many map slots per TT
- HBASE_HEAPSIZE from conf/hbase-env.sh
- Special configuration you use

Thx,

J-D

On Wed, Oct 21, 2009 at 7:57 AM, Mark Vigeant
<mark.vigeant@riskmetrics.com> wrote:
> No. Should I?
>
> -----Original Message-----
> From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of Jean-Daniel Cryans
> Sent: Wednesday, October 21, 2009 10:55 AM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Table Upload Optimization
>
> Are you using the Hadoop Streaming API?
>
> J-D
>
> On Wed, Oct 21, 2009 at 7:52 AM, Mark Vigeant
> <mark.vigeant@riskmetrics.com> wrote:
>> Hey
>>
>> So I want to upload a lot of XML data into an HTable. I have a class that successfully
maps up to about 500 MB of data or so (on one regionserver) into a table, but if I go for
much bigger than that it takes forever and eventually just stops. I tried uploading a big
XML file into my 4 regionserver cluster (about 7 GB) and it's been a day and it's still going
at it.
>>
>> What I get when I run the job on the 4 node cluster is:
>> 10/21/09 10:22:35 INFO mapred.LocalJobRunner:
>> 10/21/09 10:22:38 INFO mapred.LocalJobRunner:
>> (then it does that for a while until...)
>> 10/21/09 10:22:52 INFO mapred.TaskRunner: Task attempt_local_0001_m_000117_0 is done.
And is in the process of committing
>> 10/21/09 10:22:52 INFO mapred.LocalJobRunner:
>> 10/21/09 10:22:52 mapred.TaskRunner: Task 'attempt_local_0001_m_000117_0' is done.
>> 10/21/09 10:22:52 INFO mapred.JobClient:   map 100% reduce 0%
>> 10/21/09 10:22:58 INFO mapred.LocalJobRunner:
>> 10/21/09 10:22:59 INFO mapred.JobClient: map 99% reduce 0%
>>
>>
>> I'm convinced I'm not configuring hbase or hadoop correctly. Any suggestions?
>>
>> Mark Vigeant
>> RiskMetrics Group, Inc.
>>
>

Mime
View raw message