hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yi Jin <y...@pivotal.io>
Subject Re: HAWQ YARN RPC Errors
Date Tue, 17 May 2016 06:54:29 GMT
Hi Gagan,

I find you code is not up to date, can you update to latest code base and
try again, I am not sure if this is a fixed issue.

And let me and Wen know if it occurs after using the latest version code.
Please also provide us full log file.

Best,
Yi

On Tue, May 17, 2016 at 1:08 PM, Gagan Brahmi <gaganbrahmi@gmail.com> wrote:

> it is a 152kb file.
>
> I have renamed the file as hawq_master_rm_error.txt. Please find it
> attached.
>
>
> Regards,
> Gagan Brahmi
>
> On Mon, May 16, 2016 at 7:59 PM, Wen Lin <wlin@pivotal.io> wrote:
> > Hi, Gagan,
> >
> > Where is the log? There is no attachment in your email.
> >
> > Thanks!
> >
> > On Sun, May 15, 2016 at 1:24 AM, Gagan Brahmi <gaganbrahmi@gmail.com>
> wrote:
> >
> >> Hi Wen,
> >>
> >> Please find attached logs which has a few instances of the occurrence
> >> of the error.
> >>
> >>
> >> Regards,
> >> Gagan Brahmi
> >>
> >> On Thu, May 12, 2016 at 7:33 PM, Wen Lin <wlin@pivotal.io> wrote:
> >> > Hi, Gagan,
> >> >
> >> > It seems a sync failure between QD and Resource Manager. Not related
> to
> >> > libyarn 's RPC.
> >> > Would you like to attach the master's log file? Thanks!
> >> >
> >> > On Fri, May 13, 2016 at 12:58 AM, Gagan Brahmi <gaganbrahmi@gmail.com
> >
> >> > wrote:
> >> >
> >> >> Hi Team,
> >> >>
> >> >> Do we have some recommended tuning for the RPC warning/errors
> >> >> encountered intermittently?
> >> >>
> >> >> The error which is seen is the following:
> >> >>
> >> >> WARNING:  Sync RPC framework (inet) finds exception raised.
> >> >> ERROR:  failed to return resource to resource manager, failed to
> >> >> receive content (pquery.c:991)
> >> >>
> >> >> This error however, disappears when we retry the query. There are
> >> >> cases when the query is to be retried more than once.
> >> >>
> >> >> The error looks to be invoked when COMM2RM_CLIENT_FAIL_RECV is
> >> encountered.
> >> >>
> >> >> The setup is using YARN resource manager. And the following is the
> >> >> yarn-client configuration used:
> >> >>
> >> >> <configuration>
> >> >>
> >> >>     <property>
> >> >>       <name>hadoop.security.authentication</name>
> >> >>       <value>kerberos</value>
> >> >>     </property>
> >> >>
> >> >>     <property>
> >> >>       <name>rpc.client.connect.retry</name>
> >> >>       <value>10</value>
> >> >>     </property>
> >> >>
> >> >>     <property>
> >> >>       <name>rpc.client.connect.tcpnodelay</name>
> >> >>       <value>true</value>
> >> >>     </property>
> >> >>
> >> >>     <property>
> >> >>       <name>rpc.client.connect.timeout</name>
> >> >>       <value>600000</value>
> >> >>     </property>
> >> >>
> >> >>     <property>
> >> >>       <name>rpc.client.max.idle</name>
> >> >>       <value>10000</value>
> >> >>     </property>
> >> >>
> >> >>     <property>
> >> >>       <name>rpc.client.ping.interval</name>
> >> >>       <value>10000</value>
> >> >>     </property>
> >> >>
> >> >>     <property>
> >> >>       <name>rpc.client.read.timeout</name>
> >> >>       <value>3600000</value>
> >> >>     </property>
> >> >>
> >> >>     <property>
> >> >>       <name>rpc.client.socket.linger.timeout</name>
> >> >>       <value>-1</value>
> >> >>     </property>
> >> >>
> >> >>     <property>
> >> >>       <name>rpc.client.timeout</name>
> >> >>       <value>3600000</value>
> >> >>     </property>
> >> >>
> >> >>     <property>
> >> >>       <name>rpc.client.write.timeout</name>
> >> >>       <value>3600000</value>
> >> >>     </property>
> >> >>
> >> >>     <property>
> >> >>       <name>yarn.client.failover.max.attempts</name>
> >> >>       <value>15</value>
> >> >>     </property>
> >> >>
> >> >>   </configuration>
> >> >>
> >> >> I would appreciate some recommendations.
> >> >>
> >> >>
> >> >> Regards,
> >> >> Gagan Brahmi
> >> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message