hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: HBase Read and Write Issues in Mutlithreaded Environments
Date Sat, 09 Jul 2011 16:11:27 GMT
You read the requirements section in our docs and you have upped the
ulimits, nprocs, etc?  http://hbase.apache.org/book/os.html

If you know the row, can you deduce the regionserver its talking too?
(Below is the client failure -- we need to figure whats up on
server-side).  Once you've done that, can you check its logs?  See if
you can figure anything on why the hang?

Thanks,
St.Ack

On Sat, Jul 9, 2011 at 6:14 AM, Srikanth P. Shreenivas
<Srikanth_Shreenivas@mindtree.com> wrote:
> Hi St.Ack,
>
> We upgraded to CDH 3 (hadoop-0.20-0.20.2+923.21-1.noarch.rpm, hadoop-hbase-0.90.1+15.18-1.noarch.rpm,
hadoop-zookeeper-3.3.3+12.1-1.noarch.rpm).
>
> I ran a the same test which I was running for the app when it was running on CDH2.  The
test app posts a request the web app every 100ms, and the web app reads a HBase record, performs
some logic, and saves an audit trail by writing another HBase record.
>
> When our app was running on CDH2, I observed the below issue for every 10 to 15 requests.
> With CDH3, this issue is not happening at all.  So, seems like situation has improved
a lot, and our app seems to be lot more stable.
>
> However, I am still seeing an issue though.  There are many requests (around 1%) which
are not able to read the record from the HBase, and the get call is hanging for almost 10
minutes.  This is what I see in application log:
>
> 2011-07-09 18:27:25,537 [gridgain-#6%authGrid%] ERROR [my.app.HBaseHandler]  - Exception
occurred in searchData:
> java.io.IOException: Giving up trying to get region server: thread is interrupted.
>        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1016)
>        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:546)
>
>        <...app specific trace removed...>
>
>        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at org.gridgain.grid.util.runnable.GridRunnable.run(GridRunnable.java:194)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
>
>
> I am running the test on the same record, so all by "get" are for same row id.
>
>
>
> It will be of immense help if you can provide some inputs on whether we are missing some
configuration settings, or is there a way to get around this.
>
> Thanks,
> Srikanth
>
>
>
>
>
>
> -----Original Message-----
> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
> Sent: Wednesday, June 29, 2011 7:48 PM
> To: user@hbase.apache.org
> Subject: Re: HBase Read and Write Issues in Mutlithreaded Environments
>
> Go to CDH3 if you can.  CDH2 is also old.
> St.Ack
>
> On Wed, Jun 29, 2011 at 7:15 AM, Srikanth P. Shreenivas
> <Srikanth_Shreenivas@mindtree.com> wrote:
>> Thanks St. Ack for the inputs.
>>
>> Will upgrading to CDH3 help or is there a version within CDH2 that you recommend
we should upgrade to?
>>
>> Regards,
>> Srikanth
>>
>> -----Original Message-----
>> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
>> Sent: Wednesday, June 29, 2011 11:16 AM
>> To: user@hbase.apache.org
>> Subject: Re: HBase Read and Write Issues in Mutlithreaded Environments
>>
>> Can you upgrade?  That release is > 18 months old.  A bunch has
>> happened in the meantime.
>>
>> For retries exhausted, check whats going on on the remote regionserver
>> that you are trying to write too.  Its probably struggling and thats
>> why requests are not going through -- or the client missed the fact
>> that region moved (all stuff that should be working better in latest
>> hbase).
>>
>> St.Ack
>>
>> On Tue, Jun 28, 2011 at 9:51 PM, Srikanth P. Shreenivas
>> <Srikanth_Shreenivas@mindtree.com> wrote:
>>> Hi,
>>>
>>> We are using HBase 0.20.3 (hbase-0.20-0.20.3-1.cloudera.noarch.rpm) cluster in
distributed mode with Hadoop 0.20.2 (hadoop-0.20-0.20.2+320-1.noarch).
>>> We are using pretty much default configuration, and only thing we have customized
is that we have allocated 4GB RAM in /etc/hbase-0.20/conf/hbase-env.sh
>>>
>>> In our setup, we have a web application that reads a record from HBase and writes
a record as part of each web request.   The application is hosted in Apache Tomcat 7 and
is a stateless web application providing a REST-like web service API.
>>>
>>> We are observing that our reads and writes times out once in a  while.  This
happens more for writes.
>>> We see below exception in our application logs:
>>>
>>>
>>> Exception Type 1 - During Get:
>>> ---------------------------------------
>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region
server 10.1.68.36:60020 for region employeedata,be8784ac8b57c45625a03d52be981b88097c2fdc,1308657957879,
row 'd51b74eb05e07f96cee0ec556f5d8d161e3281f3', but failed after 10 attempts.
>>> Exceptions:
>>> java.io.IOException: Call to /10.1.68.36:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.nio.channels.ClosedByInterruptException
>>> java.nio.channels.ClosedByInterruptException
>>> java.nio.channels.ClosedByInterruptException
>>> java.nio.channels.ClosedByInterruptException
>>> java.nio.channels.ClosedByInterruptException
>>> java.nio.channels.ClosedByInterruptException
>>> java.nio.channels.ClosedByInterruptException
>>> java.nio.channels.ClosedByInterruptException
>>> java.nio.channels.ClosedByInterruptException
>>>
>>>        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1048)
>>>        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:417)
>>>     <snip>
>>>
>>> Exception  Type 2 - During Put:
>>> ---------------------------------------------
>>> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
contact region server 10.1.68.34:60020 for region audittable,,1309183872019, row '2a012017120f80a801b28f5f66a83dc2a8882d1b',
but failed after 10 attempts.
>>> Exceptions:
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>> java.io.IOException: Call to /10.1.68.34:60020 failed on local exception: java.nio.channels.ClosedByInterruptException
>>>
>>>        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1048)
>>>        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers$3.doCall(HConnectionManager.java:1239)
>>>        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1161)
>>>        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1247)
>>>        at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:609)
>>>        at org.apache.hadoop.hbase.client.HTable.put(HTable.java:474)
>>>     <snip>
>>>
>>> Any inputs on why this is happening, or how to rectify it will be of immense
help.
>>>
>>> Thanks,
>>> Srikanth
>>>
>>>
>>>
>>> Srikanth P Shreenivas|Principal Consultant | MindTree Ltd.|Global Village, RVCE
Post, Mysore Road, Bangalore-560 059, INDIA|Voice +91 80 26264000 / Fax +91 80 2626 4100|
Mob: 9880141059|email: srikanth_shreenivas@mindtree.com<mailto:asha_venkatesh@mindtree.com>
|www.mindtree.com<http://www.mindtree.com/> |
>>>
>>>
>>> ________________________________
>>>
>>> http://www.mindtree.com/email/disclaimer.html
>>>
>>
>

Mime
View raw message