hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject RE: M/R vs hbase problem in production
Date Tue, 16 Aug 2011 02:50:05 GMT

It could be that its the results from the reducer.

My guess is that he's got an issue where he's over extending his system. 
Sounds like a tuning issue.

How much memory on the system? 
What's being used by HBase?
How many reducers, How many mappers?
How large is the cache on DN, and how much cache does each job have allocated?

That's the first place to look.


> From: buttler1@llnl.gov
> To: user@hbase.apache.org
> Date: Mon, 15 Aug 2011 13:20:30 -0700
> Subject: RE: M/R vs hbase problem in production
> 
> Are you sure you need to use a reducer to put rows into hbase?  You can save a lot of
time if you can put the rows into hbase directly in the mappers.
> 
> Dave
> 
> -----Original Message-----
> From: Lior Schachter [mailto:liors@infolinks.com] 
> Sent: Sunday, August 14, 2011 9:32 AM
> To: user@hbase.apache.org; mapreduce-user@hadoop.apache.org
> Subject: M/R vs hbase problem in production
> 
> Hi,
> 
> cluster details:
> hbase 0.90.2. 10 machines. 1GB switch.
> 
> use-case
> M/R job that inserts about 10 million rows to hbase in the reducer, followed
> by M/R that works with hdfs files.
> When the first job maps finish the second job maps starts and region server
> crushes.
> please note, that when running the 2 jobs separately they both finish
> successfully.
> 
> From our monitoring we see that when the 2 jobs work together the network
> load reaches to our max bandwidth - 1GB.
> 
> In the region server log we see these exceptions:
> a.
> 2011-08-14 18:37:36,263 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
> Responder, call multi(org.apache.hadoop.hbase.client.MultiAction@491fb2f4)
> from 10.11.87.73:33737: output error
> 2011-08-14 18:37:36,264 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 24 on 8041 caught: java.nio.channels.ClosedChannelException
>         at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
>         at
> org.apache.hadoop.hbase.ipc.HBaseServer.channelIO(HBaseServer.java:1387)
>         at
> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1339)
>         at
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:727)
>         at
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:792)
>         at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1083)
> 
> b.
> 2011-08-14 18:41:56,225 WARN org.apache.hadoop.hdfs.DFSClient:
> DFSOutputStream ResponseProcessor exception  for block
> blk_-8181634225601608891_579246java.io.EOFException
>         at java.io.DataInputStream.readFully(DataInputStream.java:180)
>         at java.io.DataInputStream.readLong(DataInputStream.java:399)
>         at
> org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:122)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2548)
> 
> c.
> 2011-08-14 18:42:02,960 WARN org.apache.hadoop.hdfs.DFSClient: Failed
> recovery attempt #0 from primary datanode 10.11.87.72:50010
> org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException:
> blk_-8181634225601608891_579246 is already commited, storedBlock == null.
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.nextGenerationStampForBlock(FSNamesystem.java:4877)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNode.nextGenerationStamp(NameNode.java:501)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)
> 
>         at org.apache.hadoop.ipc.Client.call(Client.java:740)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>         at $Proxy4.nextGenerationStamp(Unknown Source)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataNode.syncBlock(DataNode.java:1577)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1551)
>         at
> org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1617)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)
> 
>         at org.apache.hadoop.ipc.Client.call(Client.java:740)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>         at $Proxy9.recoverBlock(Unknown Source)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2706)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1500(DFSClient.java:2173)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2372)
> 
> Few questions:
> 1. Can we configure hadoop/hbase not to consume all network resources (e.g.,
> to specify upper limit for map/reduce network load)?
> 2. Should we increase the timeout for open connections ?
> 3. Can we assign different IPs for data transfer and region quorum check
> protocol (zookeeper) ?
> 
> Thanks,
> Lior
 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message