hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Snow <marksnow...@yahoo.com>
Subject Re: failure after importing 42million rows
Date Fri, 25 Jul 2008 23:44:13 GMT
Hi Stack, thanks for your reply.

I'm running on small instances. It's a custom single thread data loader, no MR.

You're right, and the the hadoop  dfs -fs hdfs://domU-12-31-39-00-E9-23:50001/ -lsr /hbase
command worked and showed all the hbase files, so that looks better. I doubled the lease times
for master and region servers, but still get a reliable timeout. The exact error is:

java.net.SocketTimeoutException: timed out waiting for rpc response
        at org.apache.hadoop.ipc.Client.call(Client.java:514)
        at org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:210)
        at $Proxy1.batchUpdate(Unknown Source)
        at org.apache.hadoop.hbase.HTable$8.call(HTable.java:766)
        at org.apache.hadoop.hbase.HTable$8.call(HTable.java:764)
        at org.apache.hadoop.hbase.HTable.getRegionServerWithRetries(HTable.java:1037)
        at org.apache.hadoop.hbase.HTable.commit(HTable.java:763)
        at org.apache.hadoop.hbase.HTable.commit(HTable.java:744)

----- Original Message ----
From: stack <stack@duboce.net>
To: hbase-user@hadoop.apache.org
Sent: Friday, July 25, 2008 11:59:56 AM
Subject: Re: failure after importing 42million rows

Mark Snow wrote:
> I'm running a hbase data import on 0.1.3. After 42million rows, the import fails with
an RPC timeout exception. I've tried twice- once on a 2 node cluster and once on a 10 node
cluster (ec2 with the same configuration) and it failed both times in the same spot, somewhere
between 42 and 43 million rows. 
Small, medium, or X-large instances?

> Where should I look to debug this?
> >From the hbase shell, I can query the table and see the rows have been inserted,
but when I do a 'hadoop dfs -ls' I don't see the /hbase dir I specified, so I'm suspicious
it's not storing the data into dfs, and unsure where it is storing this data.

The $HADOOP_HOME that you are running the 'hadoop dfs -ls' under has in 
its conf file hdfs://domU-12-31-39-00-E9-23:5001/ as the  fs.default.name?

Perhaps 'hadoop  dfs -fs hdfs://domU-12-31-39-00-E9-23:50001/ -lsr 
/hbase' works?

Otherwise, nothing untoward in what you sent in email.  Whats the RPC 
error you're seeing?  Try things like upping your lease periods.    Try 
doubling hbase.regionserver.lease.period and hbase.master.lease.period.  
Are you loading via MR or via a custom script?  If the former, are 
TaskTrackers running on all nodes beside Regionservers and Datanodes?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message