hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wen Lin <w...@pivotal.io>
Subject Re: HAWQ YARN RPC Errors
Date Fri, 13 May 2016 02:33:16 GMT
Hi, Gagan,

It seems a sync failure between QD and Resource Manager. Not related to
libyarn 's RPC.
Would you like to attach the master's log file? Thanks!

On Fri, May 13, 2016 at 12:58 AM, Gagan Brahmi <gaganbrahmi@gmail.com>
wrote:

> Hi Team,
>
> Do we have some recommended tuning for the RPC warning/errors
> encountered intermittently?
>
> The error which is seen is the following:
>
> WARNING:  Sync RPC framework (inet) finds exception raised.
> ERROR:  failed to return resource to resource manager, failed to
> receive content (pquery.c:991)
>
> This error however, disappears when we retry the query. There are
> cases when the query is to be retried more than once.
>
> The error looks to be invoked when COMM2RM_CLIENT_FAIL_RECV is encountered.
>
> The setup is using YARN resource manager. And the following is the
> yarn-client configuration used:
>
> <configuration>
>
>     <property>
>       <name>hadoop.security.authentication</name>
>       <value>kerberos</value>
>     </property>
>
>     <property>
>       <name>rpc.client.connect.retry</name>
>       <value>10</value>
>     </property>
>
>     <property>
>       <name>rpc.client.connect.tcpnodelay</name>
>       <value>true</value>
>     </property>
>
>     <property>
>       <name>rpc.client.connect.timeout</name>
>       <value>600000</value>
>     </property>
>
>     <property>
>       <name>rpc.client.max.idle</name>
>       <value>10000</value>
>     </property>
>
>     <property>
>       <name>rpc.client.ping.interval</name>
>       <value>10000</value>
>     </property>
>
>     <property>
>       <name>rpc.client.read.timeout</name>
>       <value>3600000</value>
>     </property>
>
>     <property>
>       <name>rpc.client.socket.linger.timeout</name>
>       <value>-1</value>
>     </property>
>
>     <property>
>       <name>rpc.client.timeout</name>
>       <value>3600000</value>
>     </property>
>
>     <property>
>       <name>rpc.client.write.timeout</name>
>       <value>3600000</value>
>     </property>
>
>     <property>
>       <name>yarn.client.failover.max.attempts</name>
>       <value>15</value>
>     </property>
>
>   </configuration>
>
> I would appreciate some recommendations.
>
>
> Regards,
> Gagan Brahmi
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message