hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lars George <lars.geo...@gmail.com>
Subject Re: Slow MR data load to table
Date Tue, 21 Dec 2010 13:58:12 GMT
Hi Bradford,

I heard this before recently and one of the things that bit the person
in question in the butt was swapping. Could you check that all
machines are positively healthy and not swapping etc. - just to rule
out the (not so) obvious stuff.

Lars

On Mon, Dec 20, 2010 at 8:22 PM, Bradford Stephens
<bradfordstephens@gmail.com> wrote:
> Aaaand, LZO is not enabled.
>
> On Mon, Dec 20, 2010 at 8:22 PM, Bradford Stephens
> <bradfordstephens@gmail.com> wrote:
>> FYI, here is the hbase-site: http://pastebin.com/z9aqy3dQ
>>
>> Also, in hbase-env:
>>
>> export HBASE_OPTS="-XX:+HeapDumpOnOutOfMemoryError
>> -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode"
>>
>> Hrm, that seems suboptimal....
>>
>> On Mon, Dec 20, 2010 at 7:55 PM, Bradford Stephens
>> <bradfordstephens@gmail.com> wrote:
>>> Greetings HBase Homies,
>>>
>>> I'm running the .89 dev release (though I had this problem in .20.6 as
>>> well).  Trying to load 10 x 8.5 CSV files from HDFS into an empty
>>> HBase table.
>>>
>>> Getting pretty slow loads ... 85,000 records/minute/node. I'd expect
>>> this to be at least 5x faster based on past experience. Cluster has 5
>>> RSs, on AWS, 7 GB RAM x 8 "cores". c1.xlarge. Occasionally I'm getting
>>> "Failed to report status for 601 seconds. Killing!" on maptasks. WAL
>>> is disabled.
>>>
>>> What's odd is, I could have sworn it used to be *much* faster last
>>> week. I don't remember the code changing. Could it be environmental?
>>> top isn't displaying anything interesting.
>>>
>>> The schema is pretty simple. Each record is maybe 1k:
>>> id_set:id, id_set:mid, id_set:aguid, id_set:sid
>>> metadata:seq, metadata:rdu, metadata:deploytype, metadata:ver, metadata:type
>>> event:event
>>> data_set:ts, data_set:data, data_set:geo
>>>
>>> The code is simple (didn't write it):
>>> (Main): http://pastebin.com/vmPgeqNj
>>> (Mapper): http://pastebin.com/T2BQjs0k
>>>
>>> The logs are quite boring:
>>> HMaster: http://pastebin.com/zvyvNc3k
>>> Reigonserver: http://pastebin.com/QvJ4J7Ps
>>>
>>>
>>> Any ideas?
>>>
>>> --
>>> Bradford Stephens,
>>> Founder, Drawn to Scale
>>> drawntoscalehq.com
>>> 727.697.7528
>>>
>>> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
>>> solution. Process, store, query, search, and serve all your data.
>>>
>>> http://www.roadtofailure.com -- The Fringes of Scalability, Social
>>> Media, and Computer Science
>>>
>>
>>
>>
>> --
>> Bradford Stephens,
>> Founder, Drawn to Scale
>> drawntoscalehq.com
>> 727.697.7528
>>
>> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
>> solution. Process, store, query, search, and serve all your data.
>>
>> http://www.roadtofailure.com -- The Fringes of Scalability, Social
>> Media, and Computer Science
>>
>
>
>
> --
> Bradford Stephens,
> Founder, Drawn to Scale
> drawntoscalehq.com
> 727.697.7528
>
> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
> solution. Process, store, query, search, and serve all your data.
>
> http://www.roadtofailure.com -- The Fringes of Scalability, Social
> Media, and Computer Science
>

Mime
View raw message