hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Ruchovets <oruchov...@gmail.com>
Subject Re: M/R vs hbase problem in production
Date Tue, 16 Aug 2011 16:09:59 GMT
On Tue, Aug 16, 2011 at 5:50 AM, Michael Segel <michael_segel@hotmail.com>wrote:

>
> It could be that its the results from the reducer.
>
>     Yes , End result from m/r we persist it to hbase.


> My guess is that he's got an issue where he's over extending his system.
> Sounds like a tuning issue.
>
> How much memory on the system?
>

We have 10 machine grid:
  master has 48G ram
  slaves machine has 16G ram.


> What's being used by HBase?
>
Region Server process has 4G ram
Zookeeper process has 2G ram


> How many reducers, How many mappers?
>

 We have 4map/2reducer per machine


How large is the cache on DN, and how much cache does each job have
> allocated?
>
>
I am not sure that I understand correct DN cache. Where can I see this
parameter?
In case you mean about DataNode java process  , it has 1G ram.

> That's the first place to look.
>
>
Thanks In advance
Oleg.

>
> > From: buttler1@llnl.gov
> > To: user@hbase.apache.org
> > Date: Mon, 15 Aug 2011 13:20:30 -0700
> > Subject: RE: M/R vs hbase problem in production
> >
> > Are you sure you need to use a reducer to put rows into hbase?  You can
> save a lot of time if you can put the rows into hbase directly in the
> mappers.
> >
> > Dave
> >
> > -----Original Message-----
> > From: Lior Schachter [mailto:liors@infolinks.com]
> > Sent: Sunday, August 14, 2011 9:32 AM
> > To: user@hbase.apache.org; mapreduce-user@hadoop.apache.org
> > Subject: M/R vs hbase problem in production
> >
> > Hi,
> >
> > cluster details:
> > hbase 0.90.2. 10 machines. 1GB switch.
> >
> > use-case
> > M/R job that inserts about 10 million rows to hbase in the reducer,
> followed
> > by M/R that works with hdfs files.
> > When the first job maps finish the second job maps starts and region
> server
> > crushes.
> > please note, that when running the 2 jobs separately they both finish
> > successfully.
> >
> > From our monitoring we see that when the 2 jobs work together the network
> > load reaches to our max bandwidth - 1GB.
> >
> > In the region server log we see these exceptions:
> > a.
> > 2011-08-14 18:37:36,263 WARN org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> > Responder, call multi(org.apache.hadoop.hbase.client.MultiAction@491fb2f4
> )
> > from 10.11.87.73:33737: output error
> > 2011-08-14 18:37:36,264 WARN org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> > handler 24 on 8041 caught: java.nio.channels.ClosedChannelException
> >         at
> > sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
> >         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
> >         at
> > org.apache.hadoop.hbase.ipc.HBaseServer.channelIO(HBaseServer.java:1387)
> >         at
> >
> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1339)
> >         at
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:727)
> >         at
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:792)
> >         at
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1083)
> >
> > b.
> > 2011-08-14 18:41:56,225 WARN org.apache.hadoop.hdfs.DFSClient:
> > DFSOutputStream ResponseProcessor exception  for block
> > blk_-8181634225601608891_579246java.io.EOFException
> >         at java.io.DataInputStream.readFully(DataInputStream.java:180)
> >         at java.io.DataInputStream.readLong(DataInputStream.java:399)
> >         at
> >
> org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:122)
> >         at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2548)
> >
> > c.
> > 2011-08-14 18:42:02,960 WARN org.apache.hadoop.hdfs.DFSClient: Failed
> > recovery attempt #0 from primary datanode 10.11.87.72:50010
> > org.apache.hadoop.ipc.RemoteException:
> > org.apache.hadoop.ipc.RemoteException: java.io.IOException:
> > blk_-8181634225601608891_579246 is already commited, storedBlock == null.
> >         at
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.nextGenerationStampForBlock(FSNamesystem.java:4877)
> >         at
> >
> org.apache.hadoop.hdfs.server.namenode.NameNode.nextGenerationStamp(NameNode.java:501)
> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >         at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >         at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >         at java.lang.reflect.Method.invoke(Method.java:597)
> >         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
> >         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961)
> >         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957)
> >         at java.security.AccessController.doPrivileged(Native Method)
> >         at javax.security.auth.Subject.doAs(Subject.java:396)
> >         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)
> >
> >         at org.apache.hadoop.ipc.Client.call(Client.java:740)
> >         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
> >         at $Proxy4.nextGenerationStamp(Unknown Source)
> >         at
> >
> org.apache.hadoop.hdfs.server.datanode.DataNode.syncBlock(DataNode.java:1577)
> >         at
> >
> org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1551)
> >         at
> >
> org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1617)
> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >         at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >         at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >         at java.lang.reflect.Method.invoke(Method.java:597)
> >         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
> >         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961)
> >         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957)
> >         at java.security.AccessController.doPrivileged(Native Method)
> >         at javax.security.auth.Subject.doAs(Subject.java:396)
> >         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)
> >
> >         at org.apache.hadoop.ipc.Client.call(Client.java:740)
> >         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
> >         at $Proxy9.recoverBlock(Unknown Source)
> >         at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2706)
> >         at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1500(DFSClient.java:2173)
> >         at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2372)
> >
> > Few questions:
> > 1. Can we configure hadoop/hbase not to consume all network resources
> (e.g.,
> > to specify upper limit for map/reduce network load)?
> > 2. Should we increase the timeout for open connections ?
> > 3. Can we assign different IPs for data transfer and region quorum check
> > protocol (zookeeper) ?
> >
> > Thanks,
> > Lior
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message