hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From agateaaa <agate...@gmail.com>
Subject Re: Hive Metastore Server 0.9 Connection Reset and Connection Timeout errors
Date Thu, 29 Aug 2013 17:57:57 GMT
Hi All:

Put some debugging code in TUGIContainingTransport.getTransport() and I
tracked it down to

@Override
public TUGIContainingTransport getTransport(TTransport trans) {

// UGI information is not available at connection setup time, it will be
set later
// via set_ugi() rpc.
transMap.putIfAbsent(trans, new TUGIContainingTransport(trans));

//return transMap.get(trans); <-change
          TUGIContainingTransport retTrans = transMap.get(trans);

          if ( retTrans == null ) {



}





On Wed, Jul 31, 2013 at 9:48 AM, agateaaa <agateaaa@gmail.com> wrote:

> Thanks Nitin
>
> There arent too many connections in close_wait state only 1 or two when we
> run into this. Most likely its because of dropped connection.
>
> I could not find any read or write timeouts we can set for the thrift
> server which will tell thrift to hold on to the client connection.
>  See this https://issues.apache.org/jira/browse/HIVE-2006 but doesnt seem
> to have been implemented yet. We do have set a client connection timeout
> but cannot find
> an equivalent setting for the server.
>
> We have  a suspicion that this happens when we run two client processes
> which modify two distinct partitions of the same hive table. We put in a
> workaround so that the two hive client processes never run together and so
> far things look ok but we will keep monitoring.
>
> Could it be because hive metastore server is not thread safe, would
> running two alter table statements on two distinct partitions of the same
> table using two client connections cause problems like these, where hive
> metastore server closes or drops a wrong client connection and leaves the
> other hanging?
>
> Agateaaa
>
>
>
>
> On Tue, Jul 30, 2013 at 12:49 AM, Nitin Pawar <nitinpawar432@gmail.com>wrote:
>
>> The mentioned flow is called when you have unsecure mode of thrift
>> metastore client-server connection. So one way to avoid this is have a
>> secure way.
>>
>> <code>
>> public boolean process(final TProtocol in, final TProtocol out)
>> throwsTException {
>> setIpAddress(in);
>> ...
>> ...
>> ...
>> @Override
>>      protected void setIpAddress(final TProtocol in) {
>>     TUGIContainingTransport ugiTrans =
>> (TUGIContainingTransport)in.getTransport();
>>                     Socket socket = ugiTrans.getSocket();
>>     if (socket != null) {
>>       setIpAddress(socket);
>>
>> </code>
>>
>>
>> From the above code snippet, it looks like the null pointer exception is
>> not handled if the getSocket returns null.
>>
>> can you check whats the ulimit setting on the server? If its set to
>> default
>> can you set it to unlimited and restart hcat server. (This is just a wild
>> guess).
>>
>> also the getSocket method suggests "If the underlying TTransport is an
>> instance of TSocket, it returns the Socket object which it contains.
>> Otherwise it returns null."
>>
>> so someone from thirft gurus need to tell us whats happening. I have no
>> knowledge of this depth
>>
>> may be Ashutosh or Thejas will be able to help on this.
>>
>>
>>
>>
>> From the netstat close_wait, it looks like the hive metastore server has
>> not closed the connection (do not know why yet), may be the hive dev guys
>> can help.Are there too many connections in close_wait state?
>>
>>
>>
>> On Tue, Jul 30, 2013 at 5:52 AM, agateaaa <agateaaa@gmail.com> wrote:
>>
>> > Looking at the hive metastore server logs see errors like these:
>> >
>> > 2013-07-26 06:34:52,853 ERROR server.TThreadPoolServer
>> > (TThreadPoolServer.java:run(182)) - Error occurred during processing of
>> > message.
>> > java.lang.NullPointerException
>> >         at
>> >
>> >
>> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.setIpAddress(TUGIBasedProcessor.java:183)
>> >         at
>> >
>> >
>> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:79)
>> >         at
>> >
>> >
>> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176)
>> > at
>> >
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> >  at
>> >
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> > at java.lang.Thread.run(Thread.java:662)
>> >
>> > approx same time as we see timeout or connection reset errors.
>> >
>> > Dont know if this is the cause or the side affect of he connection
>> > timeout/connection reset errors. Does anybody have any pointers or
>> > suggestions ?
>> >
>> > Thanks
>> >
>> >
>> > On Mon, Jul 29, 2013 at 11:29 AM, agateaaa <agateaaa@gmail.com> wrote:
>> >
>> > > Thanks Nitin!
>> > >
>> > > We have simiar setup (identical hcatalog and hive server versions) on
>> a
>> > > another production environment and dont see any errors (its been
>> running
>> > ok
>> > > for a few months)
>> > >
>> > > Unfortunately we wont be able to move to hcat 0.5 and hive 0.11 or
>> hive
>> > > 0.10 soon.
>> > >
>> > > I did see that the last time we ran into this problem doing a
>> netstat-ntp
>> > > | grep ":10000" see that server was holding on to one socket
>> connection
>> > in
>> > > CLOSE_WAIT state for a long time
>> > >  (hive metastore server is running on port 10000). Dont know if thats
>> > > relevant here or not
>> > >
>> > > Can you suggest any hive configuration settings we can tweak or
>> > networking
>> > > tools/tips, we can use to narrow this down ?
>> > >
>> > > Thanks
>> > > Agateaaa
>> > >
>> > >
>> > >
>> > >
>> > > On Mon, Jul 29, 2013 at 11:02 AM, Nitin Pawar <
>> nitinpawar432@gmail.com
>> > >wrote:
>> > >
>> > >> Is there any chance you can do a update on test environment with
>> > hcat-0.5
>> > >> and hive-0(11 or 10) and see if you can reproduce the issue?
>> > >>
>> > >> We used to see this error when there was load on hcat server or some
>> > >> network issue connecting to the server(second one was rare
>> occurrence)
>> > >>
>> > >>
>> > >> On Mon, Jul 29, 2013 at 11:13 PM, agateaaa <agateaaa@gmail.com>
>> wrote:
>> > >>
>> > >>> Hi All:
>> > >>>
>> > >>> We are running into frequent problem using HCatalog 0.4.1 (HIve
>> > Metastore
>> > >>> Server 0.9) where we get connection reset or connection timeout
>> errors.
>> > >>>
>> > >>> The hive metastore server has been allocated enough (12G) memory.
>> > >>>
>> > >>> This is a critical problem for us and would appreciate if anyone
has
>> > any
>> > >>> pointers.
>> > >>>
>> > >>> We did add a retry logic in our client, which seems to help, but
I
>> am
>> > >>> just
>> > >>> wondering how can we narrow down to the root cause
>> > >>> of this problem. Could this be a hiccup in networking which causes
>> the
>> > >>> hive
>> > >>> server to get into a unresponsive state  ?
>> > >>>
>> > >>> Thanks
>> > >>>
>> > >>> Agateaaa
>> > >>>
>> > >>>
>> > >>> Example Connection reset error:
>> > >>> =======================
>> > >>>
>> > >>> org.apache.thrift.transport.TTransportException:
>> > >>> java.net.SocketException:
>> > >>> Connection reset
>> > >>> at
>> > >>>
>> > >>>
>> >
>> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
>> > >>>  at
>> org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
>> > >>> at
>> > >>>
>> > >>>
>> >
>> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
>> > >>>  at
>> > >>>
>> > >>>
>> >
>> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
>> > >>> at
>> > >>>
>> > >>>
>> >
>> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
>> > >>>  at
>> > org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
>> > >>> at
>> > >>>
>> > >>>
>> >
>> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_set_ugi(ThriftHiveMetastore.java:2136)
>> > >>>  at
>> > >>>
>> > >>>
>> >
>> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_ugi(ThriftHiveMetastore.java:2122)
>> > >>> at
>> > >>>
>> > >>>
>> >
>> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.openStore(HiveMetaStoreClient.java:286)
>> > >>>  at
>> > >>>
>> > >>>
>> >
>> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:197)
>> > >>> at
>> > >>>
>> > >>>
>> >
>> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:157)
>> > >>>  at
>> > >>>
>> > >>>
>> >
>> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2092)
>> > >>> at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2102)
>> > >>>  at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:888)
>> > >>> at
>> > >>>
>> > >>>
>> >
>> org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterTableAddParts(DDLSemanticAnalyzer.java:1817)
>> > >>>  at
>> > >>>
>> > >>>
>> >
>> org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:297)
>> > >>> at
>> > >>>
>> > >>>
>> >
>> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
>> > >>>  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
>> > >>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
>> > >>>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
>> > >>> at
>> > >>>
>> > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
>> > >>>  at
>> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
>> > >>> at
>> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
>> > >>>  at
>> > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341)
>> > >>> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:642)
>> > >>>  at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557)
>> > >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > >>>  at
>> > >>>
>> > >>>
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> > >>> at
>> > >>>
>> > >>>
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> > >>>  at java.lang.reflect.Method.invoke(Method.java:597)
>> > >>> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>> > >>> Caused by: java.net.SocketException: Connection reset
>> > >>> at java.net.SocketInputStream.read(SocketInputStream.java:168)
>> > >>> at
>> > >>>
>> > >>>
>> >
>> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
>> > >>>  ... 30 more
>> > >>>
>> > >>>
>> > >>>
>> > >>>
>> > >>> Example Connection timeout error:
>> > >>> ==========================
>> > >>>
>> > >>> org.apache.thrift.transport.TTransportException:
>> > >>> java.net.SocketTimeoutException: Read timed out
>> > >>> at
>> > >>>
>> > >>>
>> >
>> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
>> > >>>  at
>> org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
>> > >>> at
>> > >>>
>> > >>>
>> >
>> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
>> > >>>  at
>> > >>>
>> > >>>
>> >
>> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
>> > >>> at
>> > >>>
>> > >>>
>> >
>> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
>> > >>>  at
>> > org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
>> > >>> at
>> > >>>
>> > >>>
>> >
>> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_set_ugi(ThriftHiveMetastore.java:2136)
>> > >>>  at
>> > >>>
>> > >>>
>> >
>> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_ugi(ThriftHiveMetastore.java:2122)
>> > >>> at
>> > >>>
>> > >>>
>> >
>> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.openStore(HiveMetaStoreClient.java:286)
>> > >>>  at
>> > >>>
>> > >>>
>> >
>> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:197)
>> > >>> at
>> > >>>
>> > >>>
>> >
>> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:157)
>> > >>>  at
>> > >>>
>> > >>>
>> >
>> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2092)
>> > >>> at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2102)
>> > >>>  at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:888)
>> > >>> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:830)
>> > >>>  at
>> > >>>
>> > >>>
>> >
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:954)
>> > >>> at
>> > >>>
>> > >>>
>> >
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7524)
>> > >>>  at
>> > >>>
>> > >>>
>> >
>> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
>> > >>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
>> > >>>  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
>> > >>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
>> > >>>  at
>> > >>>
>> > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
>> > >>> at
>> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
>> > >>>  at
>> > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
>> > >>> at
>> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341)
>> > >>>  at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:642)
>> > >>> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557)
>> > >>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > >>> at
>> > >>>
>> > >>>
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> > >>>  at
>> > >>>
>> > >>>
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> > >>> at java.lang.reflect.Method.invoke(Method.java:597)
>> > >>>  at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>> > >>> Caused by: java.net.SocketTimeoutException: Read timed out
>> > >>> at java.net.SocketInputStream.socketRead0(Native Method)
>> > >>>  at java.net.SocketInputStream.read(SocketInputStream.java:129)
>> > >>> at
>> > >>>
>> > >>>
>> >
>> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
>> > >>>  ... 31 more
>> > >>>
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> Nitin Pawar
>> > >>
>> > >
>> > >
>> >
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>

Mime
View raw message