hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Meil <doug.m...@explorysmedical.com>
Subject Re: Best Hbase Storage for PIG
Date Thu, 26 Apr 2012 13:04:43 GMT

Hi there, as a sanity check with respect to writing have you
double-checked this section of the RefGuide..

http://hbase.apache.org/book.html#perf.writing

... regarding pre-created regions and monotonically increasing keys?

Also as a sanity check refer to this case study as a diagnostic roadmap..

http://hbase.apache.org/book.html#casestudies.perftroub




On 4/26/12 7:38 AM, "Rajgopal Vaithiyanathan" <raja.fire@gmail.com> wrote:

>Hey all,
>
>The default - HBaseStorage() takes hell lot of time for puts.
>
>In a cluster of 5 machines, insertion of 175 Million records took 4Hours
>45
>minutes
>Question -  Is this good enough ?
>each machine has 32 cores and 32GB ram with 7*600GB harddisks. HBASE's
>heap
>has been configured to 8GB.
>If the put speed is low, how can i improve them..?
>
>I tried tweaking the TableOutputFormat by increasing the WriteBufferSize
>to
>24MB, and adding the multi put feature (by adding 10,000 puts in ArrayList
>and putting it as a batch).  After doing this,  it started throwing
>
>java.util.concurrent.ExecutionException: java.net.SocketTimeoutException:
>Call to slave1/172.21.208.176:60020 failed on socket timeout exception:
>java.net.SocketTimeoutException: 60000 millis timeout while waiting for
>channel to be ready for read. ch :
>java.nio.channels.SocketChannel[connected
>local=/172.21.208.176:41135remote=slave1/
>172.21.208.176:60020]
>
>Which i assume is because, the clients took too long to put.
>
>The detailed log is as follows from one of the reduce job is as follows.
>
>I've 'censored' some of the details. which i assume is Okay.! :P
>2012-04-23 20:07:12,815 INFO org.apache.hadoop.util.NativeCodeLoader:
>Loaded the native-hadoop library
>2012-04-23 20:07:13,097 WARN
>org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already
>exists!
>2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
>environment:zookeeper.version=3.4.2-1221870, built on 12/21/2011 20:46 GMT
>2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
>environment:host.name=*****.*****
>2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
>environment:java.version=1.6.0_22
>2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
>environment:java.vendor=Sun Microsystems Inc.
>2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
>environment:java.home=/usr/lib/jvm/java-6-openjdk/jre
>2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
>environment:java.class.path=****************************
>2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
>environment:java.library.path=**********************
>2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
>environment:java.io.tmpdir=***************************
>2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
>environment:java.compiler=<NA>
>2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
>environment:os.name=Linux
>2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
>environment:os.arch=amd64
>2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
>environment:os.version=2.6.38-8-server
>2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
>environment:user.name=raj
>
>2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
>environment:user.home=*********
>2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
>environment:user.dir=**********************:
>2012-04-23 20:07:13,790 INFO org.apache.zookeeper.ZooKeeper: Initiating
>client connection, connectString=master:2181 sessionTimeout=180000
>watcher=hconnection
>2012-04-23 20:07:13,822 INFO org.apache.zookeeper.ClientCnxn: Opening
>socket connection to server /172.21.208.180:2181
>2012-04-23 20:07:13,823 INFO
>org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of
>this process is 72909@slave1.slave1
>2012-04-23 20:07:13,825 INFO org.apache.zookeeper.ClientCnxn: Socket
>connection established to master/172.21.208.180:2181, initiating session
>2012-04-23 20:07:13,840 INFO org.apache.zookeeper.ClientCnxn: Session
>establishment complete on server master/172.21.208.180:2181, sessionid =
>0x136dfa124e90015, negotiated timeout = 180000
>2012-04-23 20:07:14,129 INFO com.raj.OptimisedTableOutputFormat: Created
>table instance for index
>2012-04-23 20:07:14,184 INFO org.apache.hadoop.util.ProcessTree: setsid
>exited with exit code 0
>2012-04-23 20:07:14,205 INFO org.apache.hadoop.mapred.Task:  Using
>ResourceCalculatorPlugin :
>org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4513e9fd
>2012-04-23 20:08:49,852 WARN
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n:
>Failed all from
>region=index,,1335191775144.2e69ca9ad2a2d92699aa34b1dc37f1bb.,
>hostname=slave1, port=60020
>java.util.concurrent.ExecutionException: java.net.SocketTimeoutException:
>Call to slave1/172.21.208.176:60020 failed on socket timeout exception:
>java.net.SocketTimeoutException: 60000 millis timeout while waiting for
>channel to be ready for read. ch :
>java.nio.channels.SocketChannel[connected
>local=/172.21.208.176:41135remote=slave1/
>172.21.208.176:60020]
>    at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
>    at java.util.concurrent.FutureTask.get(FutureTask.java:111)
>    at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.processBatchCallback(HConnectionManager.java:1557)
>    at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.processBatch(HConnectionManager.java:1409)
>    at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:900)
>    at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:773)
>    at org.apache.hadoop.hbase.client.HTable.put(HTable.java:760)
>    at
>com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableO
>utputFormat.java:142)
>    at
>com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableO
>utputFormat.java:1)
>    at com.raj.HBaseStorage.putNext(HBaseStorage.java:583)
>    at
>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputForm
>at$PigRecordWriter.write(PigOutputFormat.java:139)
>    at
>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputForm
>at$PigRecordWriter.write(PigOutputFormat.java:98)
>    at
>org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.ja
>va:639)
>    at
>org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputCo
>ntext.java:80)
>    at
>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Ma
>p.collect(PigMapOnly.java:48)
>    at
>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMap
>Base.runPipeline(PigGenericMapBase.java:269)
>    at
>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMap
>Base.map(PigGenericMapBase.java:262)
>    at
>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMap
>Base.map(PigGenericMapBase.java:64)
>    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at javax.security.auth.Subject.doAs(Subject.java:416)
>    at
>org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.
>java:1083)
>    at org.apache.hadoop.mapred.Child.main(Child.java:249)
>Caused by: java.net.SocketTimeoutException: Call to slave1/
>172.21.208.176:60020 failed on socket timeout exception:
>java.net.SocketTimeoutException: 60000 millis timeout while waiting for
>channel to be ready for read. ch :
>java.nio.channels.SocketChannel[connected
>local=/172.21.208.176:41135remote=slave1/
>172.21.208.176:60020]
>    at
>org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:930
>)
>    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903)
>    at
>org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEn
>gine.java:150)
>    at $Proxy7.multi(Unknown Source)
>    at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n$3$1.call(HConnectionManager.java:1386)
>    at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n$3$1.call(HConnectionManager.java:1384)
>    at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.getRegionServerWithoutRetries(HConnectionManager.java:1365)
>    at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n$3.call(HConnectionManager.java:1383)
>    at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n$3.call(HConnectionManager.java:1381)
>    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>    at
>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
>1110)
>    at
>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java
>:603)
>    at java.lang.Thread.run(Thread.java:679)
>Caused by: java.net.SocketTimeoutException: 60000 millis timeout while
>waiting for channel to be ready for read. ch :
>java.nio.channels.SocketChannel[connected
>local=/172.21.208.176:41135remote=slave1/
>172.21.208.176:60020]
>    at
>org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:16
>4)
>    at
>org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>    at
>org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>    at java.io.FilterInputStream.read(FilterInputStream.java:133)
>    at
>org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HB
>aseClient.java:311)
>    at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>    at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
>    at java.io.DataInputStream.readInt(DataInputStream.java:387)
>    at
>org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseCl
>ient.java:571)
>    at
>org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:50
>5)
>2012-04-23 20:09:51,018 WARN
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n:
>Failed all from
>region=index,,1335191775144.2e69ca9ad2a2d92699aa34b1dc37f1bb.,
>hostname=slave1, port=60020
>java.util.concurrent.ExecutionException: java.net.SocketTimeoutException:
>Call to slave1/172.21.208.176:60020 failed on socket timeout exception:
>java.net.SocketTimeoutException: 60000 millis timeout while waiting for
>channel to be ready for read. ch :
>java.nio.channels.SocketChannel[connected
>local=/172.21.208.176:41150remote=slave1/
>172.21.208.176:60020]
>    at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
>    at java.util.concurrent.FutureTask.get(FutureTask.java:111)
>    at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.processBatchCallback(HConnectionManager.java:1557)
>    at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.processBatch(HConnectionManager.java:1409)
>    at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:900)
>    at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:773)
>    at org.apache.hadoop.hbase.client.HTable.put(HTable.java:760)
>    at
>com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableO
>utputFormat.java:142)
>    at
>com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableO
>utputFormat.java:1)
>    at com.raj.HBaseStorage.putNext(HBaseStorage.java:583)
>    at
>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputForm
>at$PigRecordWriter.write(PigOutputFormat.java:139)
>    at
>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputForm
>at$PigRecordWriter.write(PigOutputFormat.java:98)
>    at
>org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.ja
>va:639)
>    at
>org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputCo
>ntext.java:80)
>    at
>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Ma
>p.collect(PigMapOnly.java:48)
>    at
>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMap
>Base.runPipeline(PigGenericMapBase.java:269)
>    at
>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMap
>Base.map(PigGenericMapBase.java:262)
>    at
>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMap
>Base.map(PigGenericMapBase.java:64)
>    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at javax.security.auth.Subject.doAs(Subject.java:416)
>    at
>org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.
>java:1083)
>    at org.apache.hadoop.mapred.Child.main(Child.java:249)
>Caused by: java.net.SocketTimeoutException: Call to slave1/
>172.21.208.176:60020 failed on socket timeout exception:
>java.net.SocketTimeoutException: 60000 millis timeout while waiting for
>channel to be ready for read. ch :
>java.nio.channels.SocketChannel[connected
>local=/172.21.208.176:41150remote=slave1/
>172.21.208.176:60020]
>    at
>org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:930
>)
>    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903)
>    at
>org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEn
>gine.java:150)
>    at $Proxy7.multi(Unknown Source)
>    at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n$3$1.call(HConnectionManager.java:1386)
>    at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n$3$1.call(HConnectionManager.java:1384)
>    at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.getRegionServerWithoutRetries(HConnectionManager.java:1365)
>    at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n$3.call(HConnectionManager.java:1383)
>    at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n$3.call(HConnectionManager.java:1381)
>    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>    at
>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
>1110)
>    at
>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java
>:603)
>    at java.lang.Thread.run(Thread.java:679)
>Caused by: java.net.SocketTimeoutException: 60000 millis timeout while
>waiting for channel to be ready for read. ch :
>java.nio.channels.SocketChannel[connected
>local=/172.21.208.176:41150remote=slave1/
>172.21.208.176:60020]
>    at
>org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:16
>4)
>    at
>org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>    at
>org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>    at java.io.FilterInputStream.read(FilterInputStream.java:133)
>    at
>org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HB
>aseClient.java:311)
>    at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>    at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
>    at java.io.DataInputStream.readInt(DataInputStream.java:387)
>    at
>org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseCl
>ient.java:571)
>    at
>org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:50
>5)
>
>-- 
>Thanks and Regards,
>Raj



Mime
View raw message