hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Srikanth P. Shreenivas" <Srikanth_Shreeni...@mindtree.com>
Subject RE: Query regarding HTable.get and timeouts
Date Mon, 22 Aug 2011 11:58:20 GMT
Hi,

Going with the assumption that our client threads may getting interrupted and it may not be
an hbase issue,  we rebuilt our client application without GridGain.  Earlier our code was
being executed by GridGain's thread pool, but now we made the app to run in raw Tomcat.

I am very glad to say that we are not seeing any " java.nio.channels.ClosedByInterruptException".
My client app is working just great, performing its hbase read/writes as expected.


Thanks a lot for all the help.

Regards,
Srikanth
PS: I must say that HBase community is really great.  Really appreciate all the inputs and
suggestions.


-----Original Message-----
From: Srikanth P. Shreenivas
Sent: Monday, August 22, 2011 3:43 PM
To: user@hbase.apache.org
Subject: RE: Query regarding HTable.get and timeouts

Yes, DC1AuthDFSC1D3 hosts the root region.  It is also region server 3.    DC1AuthDFSC1D1,
DC1AuthDFSC1D2, DC1AuthDFSC1D3 and DC1AuthDFSC1D4 are 4 region servers in our cluster.

******************************************

I checked with Data Centre team, they confirmed that there is no firewall in the network where
hbase servers and client applications is running.

******************************************

Regarding client and server running different versions, they are running same versions.  If
there was version mismatch, I guess we would be seeing the issue for all the reads.  Here
we see the issue only for few reads, one in 10-15 reads fail this way.  We do use same hbase,
zookeeper and hadoop jars as found in the HBase distribution.

Strangely enough, I saw the below for the first time today, and it has occurred only once
so far.  10.3.48.61 is the IP address where our client app is running.
2011-08-22 11:46:55,905 WARN org.apache.hadoop.ipc.HBaseServer: Incorrect header or version
mismatch from 10.3.48.61:7625 got version 6 expected version 3
2011-08-22 11:46:57,542 WARN org.apache.hadoop.ipc.HBaseServer: Incorrect header or version
mismatch from 10.3.48.61:7626 got version 6 expected version 3
2011-08-22 11:46:58,483 WARN org.apache.hadoop.ipc.HBaseServer: Incorrect header or version
mismatch from 10.3.48.61:7627 got version 6 expected version 3
2011-08-22 11:46:59,335 WARN org.apache.hadoop.ipc.HBaseServer: Incorrect header or version
mismatch from 10.3.48.61:7628 got version 6 expected version 3
2011-08-22 11:47:00,164 WARN org.apache.hadoop.ipc.HBaseServer: Incorrect header or version
mismatch from 10.3.48.61:7629 got version 6 expected version 3
2011-08-22 11:47:00,972 WARN org.apache.hadoop.ipc.HBaseServer: Incorrect header or version
mismatch from 10.3.48.61:7630 got version 6 expected version 3
2011-08-22 11:47:01,768 WARN org.apache.hadoop.ipc.HBaseServer: Incorrect header or version
mismatch from 10.3.48.61:7631 got version 6 expected version 3
2011-08-22 11:47:02,648 WARN org.apache.hadoop.ipc.HBaseServer: Incorrect header or version
mismatch from 10.3.48.61:7632 got version 6 expected version 3

******************************************

I enabled debug logging level for all classes today.  Here is the exception associated with
"null" messages.

*** Do you think that some thread in client is doing interrupt() resulting in "java.nio.channels.ClosedByInterruptException"
below? ***


2011-08-22 11:51:29,663 [gridgain-#6%authGrid%:grid-job-worker] DEBUG [hbase.client.HConnectionManager$HConnectionImplementation]
 - locateRegionInMeta parentTable=-ROOT-, metaLocation=address: DC1AuthDFSC1D3.cidr.gov.in:6020,
regioninfo: -ROOT-,,0.70236052, attempt=0 of 10 failed; retrying after sleep of 1000 because:
null
2011-08-22 11:51:29,663 [gridgain-#6%authGrid%:grid-job-worker] DEBUG [hbase.client.HConnectionManager$HConnectionImplementation]
 - Lookedup root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@211c7f8d;
hsa=DC1AuthDFSC1D3.cidr.gov.in:6020
2011-08-22 11:51:30,665 [gridgain-#6%authGrid%:grid-job-worker] DEBUG [hbase.client.HConnectionManager$HConnectionImplementation]
 - Lookedup root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@211c7f8d;
hsa=DC1AuthDFSC1D3.cidr.gov.in:6020
2011-08-22 11:51:30,665 [gridgain-#6%authGrid%:grid-job-worker] DEBUG [hadoop.ipc.HBaseClient]
 - Connecting to DC1AuthDFSC1D3.cidr.gov.in/10.3.48.69:6020
2011-08-22 11:51:30,665 [gridgain-#6%authGrid%:grid-job-worker] DEBUG [hadoop.ipc.HBaseClient]
 - closing ipc connection to DC1AuthDFSC1D3.cidr.gov.in/10.3.48.69:6020: null
java.nio.channels.ClosedByInterruptException
        at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
        at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:511)
        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
        at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883)
        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
        at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
        at $Proxy41.getClosestRowBefore(Unknown Source)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:719)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:589)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:558)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:687)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:593)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:564)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:415)
        at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1002)
        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:546)
        at in.gov.uidai.platform.impl.persistence.handler.HBaseHandler.findEntities(HBaseHandler.java:271)
        at in.gov.uidai.platform.impl.persistence.handler.HBaseHandler.findObject(HBaseHandler.java:156)
        at in.gov.uidai.platform.impl.persistence.provider.AbstractPersistenceProvider.findObject(AbstractPersistenceProvider.java:116)
        at in.gov.uidai.platform.impl.persistence.PersistenceManagerProvider.findObject(PersistenceManagerProvider.java:270)
        at in.gov.uidai.authcommon.dao.impl.hbase.ResidentDetailsDAOImpl.findResidentDetailEntity(ResidentDetailsDAOImpl.java:69)
        at in.gov.uidai.authcommon.dao.impl.hbase.ResidentDetailsDAOImpl.findResidentDetails(ResidentDetailsDAOImpl.java:48)
        at in.gov.uidai.authcommon.core.impl.steps.ResidentDetailsReader.findResident(ResidentDetailsReader.java:176)
        at in.gov.uidai.authcommon.core.impl.steps.ResidentDetailsReader.doPerform(ResidentDetailsReader.java:63)
        at in.gov.uidai.authcommon.core.ProcessingStep.perform(ProcessingStep.java:36)
        at in.gov.uidai.authcommon.core.impl.Authenticator.performAndReturnContext(Authenticator.java:40)
        at in.gov.uidai.authserver.grid.AuthenticationGridJob.execute(AuthenticationGridJob.java:27)
        at org.gridgain.grid.kernal.processors.job.GridJobWorker.body(GridJobWorker.java:406)
        at org.gridgain.grid.util.runnable.GridRunnable$1.run(GridRunnable.java:142)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at org.gridgain.grid.util.runnable.GridRunnable.run(GridRunnable.java:194)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
2011-08-22 11:51:30,666 [gridgain-#6%authGrid%:grid-job-worker] DEBUG [hadoop.ipc.HBaseClient]
 - IPC Client (47) connection to DC1AuthDFSC1D3.cidr.gov.in/10.3.48.69:6020 from an unknown
user: closed
2011-08-22 11:51:30,666 [gridgain-#6%authGrid%:grid-job-worker] DEBUG [hbase.client.HConnectionManager$HConnectionImplementation]
 - locateRegionInMeta parentTable=-ROOT-, metaLocation=address: DC1AuthDFSC1D3.cidr.gov.in:6020,
regioninfo: -ROOT-,,0.70236052, attempt=1 of 10 failed; retrying after sleep of 1000 because:
null
2011-08-22 11:51:30,666 [gridgain-#6%authGrid%:grid-job-worker] DEBUG [hbase.client.HConnectionManager$HConnectionImplementation]
 - Lookedup root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@211c7f8d;
hsa=DC1AuthDFSC1D3.cidr.gov.in:6020
...
...
...
And above pattern keeps repeating.

******************************************



Regards,
Srikanth


-----Original Message-----
From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of Jean-Daniel Cryans
Sent: Monday, August 22, 2011 2:32 AM
To: user@hbase.apache.org
Subject: Re: Query regarding HTable.get and timeouts

Yeah that null message isn't really helpful :)

So one thing that might be helpful would be to know who DC1AuthDFSC1D3
is, since you identified the logs as "Region server n".

Then look at the master's web UI and see where -ROOT- is assigned. Is
it also DC1AuthDFSC1D3?

If so, then I would proceed by checking if there's a firewall in
between the client and the cluster, also I would make sure that the
client is running the same version as the server.

J-D

On Sat, Aug 20, 2011 at 5:56 AM, Srikanth P. Shreenivas
<Srikanth_Shreenivas@mindtree.com> wrote:
> Further in this investigation, we enabled the debug logs on client side.
>
> We are observing that client is trying to root region, and is continuously failing to
do so.  The logs are filled with entries like this:
>
> 2011-08-20 17:20:09,092 [gridgain-#6%authGrid%] DEBUG [hbase.client.HConnectionManager$HConnectionImplementation]
 - Lookedup root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@2cc25ae3;
hsa=DC1AuthDFSC1D3.cidr.gov.in:6020
> 2011-08-20 17:20:09,092 [gridgain-#6%authGrid%] DEBUG [hbase.client.HConnectionManager$HConnectionImplementation]
 - locateRegionInMeta parentTable=-ROOT-, metaLocation=address: DC1AuthDFSC1D3.cidr.gov.in:6020,
regioninfo: -ROOT-,,0.70236052, attempt=0 of 10 failed; retrying after sleep of 1000
> because: null
>
> Client keeps retrying and retries get exhausted.
>
>
> Complete logs are available here: https://gist.github.com/1159064  including logs of
master, zookeeper and region servers.
>
>
> If you can please look at the logs and provide some inputs on this issue, then it will
be really helpful.
> We are really not sure why client is failing to get root regions from the server.  Any
guidance will be greatly appreciated.
>
>
> Thanks a lot,
> Srikanth

________________________________

http://www.mindtree.com/email/disclaimer.html

Mime
View raw message