falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandeep Samudrala <sandys...@gmail.com>
Subject Re: All running processes are in UNKNOWN status
Date Mon, 22 Feb 2016 11:32:59 GMT
Whats the memory allocated for the Falcon Server ? It could be that falcon
server going out of memory. Can you check if falcon server is throwing Full
GCs ?
If so can you try removing
org.apache.falcon.metadata.MetadataMappingService
from startup.properties and start the falcon server and try?

On Mon, Feb 22, 2016 at 1:09 PM, Margus Roo <margus@roo.ee> wrote:

> Found rows from log:
>
> 2016-02-22 08:54:48,273 INFO  - [126455586@qtp-525968792-61 -
> 763a5818-27e2-4ada-8d45-50b06afffa8e:margusja:GET//entities/list/feed,process]
> ~ {Action:list, Dimensions:{}, Status: SUCCEEDED, Time-taken:579935193 ns}
> (METRIC:38)
> 2016-02-22 08:54:48,274 DEBUG - [126455586@qtp-525968792-61 -
> 763a5818-27e2-4ada-8d45-50b06afffa8e:] ~ Audit: margusja/10.65.104.39
> performed request
> http://hadoopnn2.estpak.ee:15000/api/entities/list/feed,process?fields=clusters,tags,status&offset=0&numResults=10
> (88.196.164.43) at time 2016-02-22T06:54Z (FalconAuditFilter:86)
> 2016-02-22 08:55:10,388 INFO  - [ActiveMQ ShutdownHook:] ~ ActiveMQ
> Message Broker (localhost, ID:hadoopnn2.estpak.ee-48159-1455867360485-0:1)
> is shutting down (BrokerService:560)
> 2016-02-22 08:55:10,389 INFO  - [ActiveMQ ShutdownHook:] ~ Connector
> vm://localhost Stopped (TransportConnector:288)
> 2016-02-22 08:55:10,652 INFO  - [ActiveMQ Connection Executor: tcp://
> hadoopnn2.estpak.ee/88.196.164.43:61616:] ~ Error in onException for
> topicSubscriber of topic: FALCON.ENTITY.TOPIC (JMSMessageConsumer:144)
> javax.jms.JMSException: java.io.EOFException
>         at
> org.apache.activemq.util.JMSExceptionSupport.create(JMSExceptionSupport.java:49)
>         at
> org.apache.activemq.ActiveMQConnection.onAsyncException(ActiveMQConnection.java:1833)
>         at
> org.apache.activemq.ActiveMQConnection.onException(ActiveMQConnection.java:1850)
>         at
> org.apache.activemq.transport.TransportFilter.onException(TransportFilter.java:101)
>         at
> org.apache.activemq.transport.ResponseCorrelator.onException(ResponseCorrelator.java:126)
>         at
> org.apache.activemq.transport.TransportFilter.onException(TransportFilter.java:101)
>         at
> org.apache.activemq.transport.TransportFilter.onException(TransportFilter.java:101)
>         at
> org.apache.activemq.transport.WireFormatNegotiator.onException(WireFormatNegotiator.java:160)
>         at
> org.apache.activemq.transport.InactivityMonitor.onException(InactivityMonitor.java:266)
>         at
> org.apache.activemq.transport.TransportSupport.onException(TransportSupport.java:96)
>         at
> org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:206)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.EOFException
>         at java.io.DataInputStream.readInt(DataInputStream.java:392)
>         at
> org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:269)
>         at
> org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:227)
>         at
> org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:219)
>         at
> org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:202)
>         ... 1 more
>
>
> And before that there are loads of kerberos related problems:
> 2016-02-22 08:54:48,272 WARN  - [126455586@qtp-525968792-61 -
> 763a5818-27e2-4ada-8d45-50b06afffa8e:margusja:GET//entities/list/feed,process]
> ~ Exception while invoking class
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo
> over hadoopnn2.estpak.ee/88.196.164.43:8020. Not retrying because
> failovers (15) exceeded maximum allowed (15) (RetryInvocationHandler:121)
> java.io.IOException: Failed on local exception: java.io.IOException:
> javax.security.sasl.SaslException: GSS initiate failed [Caused by
> GSSException: No valid credentials provided (Mechanism level: Failed to
> find any Kerberos tgt)]; Host Details : local host is: "
> hadoopnn2.estpak.ee/88.196.164.43"; destination host is: "
> hadoopnn2.estpak.ee":8020;
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773)
>
> But thous kerberos problems will resolve after falcon restart.
>
> Anyway this is not the right list as I understand. Can you provide my user@
> subscription e-mail?
>
> Margus (margusja) Roo
> http://margus.roo.ee
> skype: margusja
> +372 51 48 780
>
> On 22/02/16 09:31, Pallavi Rao wrote:
>
>> It might not have to do with a particular process. It might go into
>> UNKNOWN
>> status when Falcon is unable to communicate with Oozie, for example. What
>> will help in this case is the falcon.application.log (Falcon server logs).
>>
>> Regards,
>> Pallavi
>>
>> On Mon, Feb 22, 2016 at 12:49 PM, Margus Roo <margus@roo.ee> wrote:
>>
>> It is difficult because I have already more than ten processes are running
>>> and I do not know exact moment when they are going in to UNKNOWN status.
>>> I just hoped that it had happened before and someone in this list have
>>> ideas.
>>> So you think it is related with processes?
>>> Then I can start only one process and then I see is it going to UNKNOWN.
>>>
>>> I tried to subscribe to user@ list but no success. In falcon site I can
>>> not find user list subscribe e-mail. If you can provide it I can ask help
>>> from user list.
>>>
>>> Margus (margusja) Roo
>>> http://margus.roo.ee
>>> skype: margusja
>>> +372 51 48 780
>>>
>>> On 22/02/16 09:14, Sandeep Samudrala wrote:
>>>
>>> Hi Margus,
>>>> Please do send such queries over users mailing list. Can you attach your
>>>> process definition and also can you check application.log. Please attach
>>>> any stack trace if any.
>>>>
>>>> Thanks,
>>>> -Sandeep
>>>> On Feb 22, 2016 12:28 PM, "Margus Roo" <margus@roo.ee> wrote:
>>>>
>>>> Hi
>>>>
>>>>> I am using Falcon- 0.6.1.2.3 packaged by Hortonworks HDP-2.3
>>>>>
>>>>> I noticed that all my running processes go after some days in to
>>>>> UNKNOWN
>>>>> status. After restarting Falcon they are back in RUNNING status. And
>>>>> after
>>>>> some days it is repeating again.
>>>>>
>>>>> --
>>>>> Margus (margusja) Roo
>>>>> http://margus.roo.ee
>>>>> skype: margusja
>>>>> +372 51 48 780
>>>>>
>>>>>
>>>>>
>>>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message