hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bradford Stephens <bradfordsteph...@gmail.com>
Subject Slow MR data load to table
Date Tue, 21 Dec 2010 03:55:15 GMT
Greetings HBase Homies,

I'm running the .89 dev release (though I had this problem in .20.6 as
well).  Trying to load 10 x 8.5 CSV files from HDFS into an empty
HBase table.

Getting pretty slow loads ... 85,000 records/minute/node. I'd expect
this to be at least 5x faster based on past experience. Cluster has 5
RSs, on AWS, 7 GB RAM x 8 "cores". c1.xlarge. Occasionally I'm getting
"Failed to report status for 601 seconds. Killing!" on maptasks. WAL
is disabled.

What's odd is, I could have sworn it used to be *much* faster last
week. I don't remember the code changing. Could it be environmental?
top isn't displaying anything interesting.

The schema is pretty simple. Each record is maybe 1k:
id_set:id, id_set:mid, id_set:aguid, id_set:sid
metadata:seq, metadata:rdu, metadata:deploytype, metadata:ver, metadata:type
data_set:ts, data_set:data, data_set:geo

The code is simple (didn't write it):
(Main): http://pastebin.com/vmPgeqNj
(Mapper): http://pastebin.com/T2BQjs0k

The logs are quite boring:
HMaster: http://pastebin.com/zvyvNc3k
Reigonserver: http://pastebin.com/QvJ4J7Ps

Any ideas?

Bradford Stephens,
Founder, Drawn to Scale

http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
solution. Process, store, query, search, and serve all your data.

http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science

View raw message