atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sarath Subramanian (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ATLAS-2751) Atlas is not consuming messages from ATLAS_HOOK topic after recovering from zookeeper connection timeout.
Date Fri, 08 Jun 2018 22:45:00 GMT
Sarath Subramanian created ATLAS-2751:
-----------------------------------------

             Summary: Atlas is not consuming messages from ATLAS_HOOK topic after recovering
from zookeeper connection timeout.
                 Key: ATLAS-2751
                 URL: https://issues.apache.org/jira/browse/ATLAS-2751
             Project: Atlas
          Issue Type: Bug
          Components:  atlas-core
    Affects Versions: 1.1.0
            Reporter: Sarath Subramanian
             Fix For: 1.1.0


Atlas is not consuming messages from ATLAS_HOOK topic after recovering from zookeeper connection
timeout.

* Below snapshot showing, atlas going into passive state with notification consumer thread
shutdown
{noformat}
2018-05-21 18:45:42,156 INFO - [org.apache.ranger.audit.queue.AuditBatchQueue1:] ~ Audit Status
Log: name=atlas.async.multi_dest.batch, finalDestination=atlas.async.multi_dest.batch.solr,
interval=01:00.002 minutes, events=4, succcessCount=2, totalEvents=2960, totalSuccessCount=112
(BaseAuditHandler:312)
2018-05-21 20:33:32,604 INFO - [main-SendThread(ctr-e138-1518143905142-323545-01-000005.hwx.site:2181):]
~ Client session timed out, have not heard from server in 13339ms for sessionid 0x1637d8f847f002c,
closing socket connection and attempting reconnect (ClientCnxn:1140)
2018-05-21 20:33:32,705 INFO - [main-EventThread:] ~ State change: SUSPENDED (ConnectionStateManager:228)
2018-05-21 20:33:32,708 WARN - [Curator-ConnectionStateManager-0:] ~ Server instance with
server id id1 is removed as leader (ActiveInstanceElectorService:191)
2018-05-21 20:33:32,708 WARN - [Curator-ConnectionStateManager-0:] ~ Instance becoming passive
from ACTIVE (ServiceState:92)
2018-05-21 20:33:32,709 INFO - [Curator-ConnectionStateManager-0:] ~ Reacting to passive state:
shutting down Kafka consumers. (NotificationHookConsumer:227)
2018-05-21 20:33:32,709 INFO - [Curator-ConnectionStateManager-0:] ~ ==> stopConsumerThreads()
(NotificationHookConsumer:193)
2018-05-21 20:33:32,709 INFO - [Curator-ConnectionStateManager-0:] ~ ==> HookConsumer shutdown()
(NotificationHookConsumer:610)
2018-05-21 20:33:32,710 INFO - [Curator-ConnectionStateManager-0:] ~ [atlas-hook-consumer-thread]:
Shutting down (NotificationHookConsumer$HookConsumer:72)
2018-05-21 20:33:32,710 INFO - [NotificationHookConsumer thread-0:] ~ closing NotificationConsumer
(NotificationHookConsumer:343)
2018-05-21 20:33:34,550 INFO - [main-SendThread(ctr-e138-1518143905142-323545-01-000005.hwx.site:2181):]
~ Client will use GSSAPI as SASL mechanism. (ZooKeeperSaslClient:285)
2018-05-21 20:33:34,551 INFO - [main-SendThread(ctr-e138-1518143905142-323545-01-000005.hwx.site:2181):]
~ Opening socket connection to server ctr-e138-1518143905142-323545-01-000005.hwx.site/172.27.15.200:2181.
Will attempt to SASL-authenticate using Login Context section 'Client' (ClientCnxn:1019)
2018-05-21 20:33:47,715 ERROR - [Curator-Framework-0:] ~ Connection timed out for connection
string (ctr-e138-1518143905142-323545-01-000005.hwx.site:2181) and timeout (15000) / elapsed
(15006) (ConnectionState:200)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
 at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:197)
 at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88)
 at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:116)
 at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:835)
 at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
 at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
 at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
{noformat}
 
 
* Below snapshot showing, atlas recovering from zookeeper timeout and assuming to active instance
with consumer thread initialized to read kafka events.
{noformat}
2018-05-21 20:34:13,358 INFO - [Curator-Framework-0-EventThread:] ~ State change: RECONNECTED
(ConnectionStateManager:228)
2018-05-21 20:34:13,366 WARN - [Curator-Framework-0-EventThread:] ~ Server instance with server
id id1 is elected as leader (ActiveInstanceElectorService:145)
2018-05-21 20:34:13,366 WARN - [Curator-Framework-0-EventThread:] ~ Instance becoming active
from PASSIVE (ServiceState:76)
2018-05-21 20:34:13,366 INFO - [Curator-Framework-0-EventThread:] ~ Reacting to active: Creating
HBase table for Audit if required. (HBaseBasedAuditRepository:571)
2018-05-21 20:34:13,366 INFO - [Curator-Framework-0-EventThread:] ~ Checking if table ATLAS_ENTITY_AUDIT_EVENTS
exists (HBaseBasedAuditRepository:513)
2018-05-21 20:34:13,379 INFO - [Curator-Framework-0-EventThread:] ~ Table ATLAS_ENTITY_AUDIT_EVENTS
exists (HBaseBasedAuditRepository:525)
2018-05-21 20:34:13,379 INFO - [Curator-Framework-0-EventThread:] ~ Reacting to active: initializing
index (GraphBackedSearchIndexer:132)
2018-05-21 20:34:13,380 INFO - [Curator-Framework-0-EventThread:] ~ Creating indexes for graph.
(GraphBackedSearchIndexer:243)
2018-05-21 20:34:13,429 INFO - [Curator-Framework-0-EventThread:] ~ Index creation for global
keys complete. (GraphBackedSearchIndexer:293)
2018-05-21 20:34:13,430 INFO - [Curator-Framework-0-EventThread:] ~ ==> AtlasTypeDefStoreInitializer.instanceIsActive()
(AtlasTypeDefStoreInitializer:331)
2018-05-21 20:34:13,430 INFO - [Curator-Framework-0-EventThread:] ~ ==> AtlasTypeDefGraphStoreV1.init()
(AtlasTypeDefGraphStoreV1:106)
2018-05-21 20:34:13,430 INFO - [Curator-Framework-0-EventThread:] ~ ==> AtlasTypeDefGraphStore.init()
(AtlasTypeDefGraphStore:81)
2018-05-21 20:34:58,686 WARN - [Curator-Framework-0-EventThread:] ~ No RelationshipDef defined
between table_type_mALvg and db_type_mALvg on attribute: table_type_mALvg.db (AtlasEntityType:145)
2018-05-21 20:34:58,686 WARN - [Curator-Framework-0-EventThread:] ~ No RelationshipDef defined
between table_type_mALvg and column_type_mALvg on attribute: table_type_mALvg.column (AtlasEntityType:145)
2018-05-21 20:34:58,686 WARN - [Curator-Framework-0-EventThread:] ~ No RelationshipDef defined
between composite_entity_dRRgt and simple_entity_dRRgt on attribute: composite_entity_dRRgt.simple_entity
(AtlasEntityType:145)
2018-05-21 20:34:58,687 WARN - [Curator-Framework-0-EventThread:] ~ No RelationshipDef defined
between composite_entity_zbqFl and simple_entity_zbqFl on attribute: composite_entity_zbqFl.simple_entity
(AtlasEntityType:145)
2018-05-21 20:34:58,688 WARN - [Curator-Framework-0-EventThread:] ~ No RelationshipDef defined
between __AtlasUserProfile and __AtlasUserSavedSearch on attribute: __AtlasUserProfile.savedSearches
(AtlasEntityType:145)
2018-05-21 20:34:58,688 WARN - [Curator-Framework-0-EventThread:] ~ No RelationshipDef defined
between composite_entity_PEhru and simple_entity_PEhru on attribute: composite_entity_PEhru.simple_entity
(AtlasEntityType:145)
2018-05-21 20:34:58,688 WARN - [Curator-Framework-0-EventThread:] ~ No RelationshipDef defined
between table_type_UUDrx and db_type_UUDrx on attribute: table_type_UUDrx.db (AtlasEntityType:145)
2018-05-21 20:34:58,688 WARN - [Curator-Framework-0-EventThread:] ~ No RelationshipDef defined
between table_type_UUDrx and column_type_UUDrx on attribute: table_type_UUDrx.column (AtlasEntityType:145)
2018-05-21 20:34:58,689 WARN - [Curator-Framework-0-EventThread:] ~ No RelationshipDef defined
between composite_entity_qhCgR and simple_entity_qhCgR on attribute: composite_entity_qhCgR.simple_entity
(AtlasEntityType:145)
2018-05-21 20:34:58,689 INFO - [Curator-Framework-0-EventThread:] ~ <== AtlasTypeDefGraphStore.init()
(AtlasTypeDefGraphStore:105)
2018-05-21 20:34:58,689 INFO - [Curator-Framework-0-EventThread:] ~ <== AtlasTypeDefGraphStoreV1.init()
(AtlasTypeDefGraphStoreV1:110)
2018-05-21 20:34:58,692 INFO - [Curator-Framework-0-EventThread:] ~ ==> AtlasTypeDefStoreInitializer.loadBootstrapTypeDefs()
(AtlasTypeDefStoreInitializer:116)
2018-05-21 20:34:58,692 INFO - [Curator-Framework-0-EventThread:] ~ ==> AtlasTypeDefStoreInitializer(/usr/hdp/3.0.0.0-1365/atlas/models/0000-Area0)
(AtlasTypeDefStoreInitializer:153)
2018-05-21 20:34:58,694 INFO - [Curator-Framework-0-EventThread:] ~ No new type in file /usr/hdp/3.0.0.0-1365/atlas/models/0000-Area0/0010-base_model.json
(AtlasTypeDefStoreInitializer:185)
2018-05-21 20:34:58,697 INFO - [Curator-Framework-0-EventThread:] ~ No new type in file /usr/hdp/3.0.0.0-1365/atlas/models/0000-Area0/0011-glossary_model.json
(AtlasTypeDefStoreInitializer:185)
2018-05-21 20:34:58,697 INFO - [Curator-Framework-0-EventThread:] ~ Type patches directory
/usr/hdp/3.0.0.0-1365/atlas/models/0000-Area0/patches does not exist or not readable or has
no patches (AtlasTypeDefStoreInitializer:392)
2018-05-21 20:34:58,697 INFO - [Curator-Framework-0-EventThread:] ~ <== AtlasTypeDefStoreInitializer(/usr/hdp/3.0.0.0-1365/atlas/models/0000-Area0)
(AtlasTypeDefStoreInitializer:196)
2018-05-21 20:34:58,697 INFO - [Curator-Framework-0-EventThread:] ~ ==> AtlasTypeDefStoreInitializer(/usr/hdp/3.0.0.0-1365/atlas/models/1000-Hadoop)
(AtlasTypeDefStoreInitializer:153)
2018-05-21 20:34:58,698 INFO - [Curator-Framework-0-EventThread:] ~ No new type in file /usr/hdp/3.0.0.0-1365/atlas/models/1000-Hadoop/1020-fs_model.json
(AtlasTypeDefStoreInitializer:185)
2018-05-21 20:34:58,700 INFO - [Curator-Framework-0-EventThread:] ~ No new type in file /usr/hdp/3.0.0.0-1365/atlas/models/1000-Hadoop/1030-hive_model.json
(AtlasTypeDefStoreInitializer:185)
2018-05-21 20:34:58,700 INFO - [Curator-Framework-0-EventThread:] ~ No new type in file /usr/hdp/3.0.0.0-1365/atlas/models/1000-Hadoop/1040-sqoop_model.json
(AtlasTypeDefStoreInitializer:185)
2018-05-21 20:34:58,701 INFO - [Curator-Framework-0-EventThread:] ~ No new type in file /usr/hdp/3.0.0.0-1365/atlas/models/1000-Hadoop/1050-falcon_model.json
(AtlasTypeDefStoreInitializer:185)
2018-05-21 20:34:58,702 INFO - [Curator-Framework-0-EventThread:] ~ No new type in file /usr/hdp/3.0.0.0-1365/atlas/models/1000-Hadoop/1060-hbase_model.json
(AtlasTypeDefStoreInitializer:185)
2018-05-21 20:34:58,703 INFO - [Curator-Framework-0-EventThread:] ~ No new type in file /usr/hdp/3.0.0.0-1365/atlas/models/1000-Hadoop/1070-kafka_model.json
(AtlasTypeDefStoreInitializer:185)
2018-05-21 20:34:58,704 INFO - [Curator-Framework-0-EventThread:] ~ No new type in file /usr/hdp/3.0.0.0-1365/atlas/models/1000-Hadoop/1080-storm_model.json
(AtlasTypeDefStoreInitializer:185)
.
.
.
.
.
2018-05-21 20:34:58,708 INFO - [Curator-Framework-0-EventThread:] ~ patch skipped: typeName=hbase_table;
applyToVersion=1.2; updateToVersion=1.3 (AtlasTypeDefStoreInitializer:618)
2018-05-21 20:34:58,708 INFO - [Curator-Framework-0-EventThread:] ~ patch skipped: typeName=hbase_column_family;
applyToVersion=1.1; updateToVersion=1.2 (AtlasTypeDefStoreInitializer:618)
2018-05-21 20:34:58,708 INFO - [Curator-Framework-0-EventThread:] ~ Applying patches in file
/usr/hdp/3.0.0.0-1365/atlas/models/1000-Hadoop/patches/006-kafka_topic_add_attribute.json
(AtlasTypeDefStoreInitializer:416)
2018-05-21 20:34:58,708 INFO - [Curator-Framework-0-EventThread:] ~ patch skipped: typeName=kafka_topic;
applyToVersion=1.0; updateToVersion=1.1 (AtlasTypeDefStoreInitializer:618)
2018-05-21 20:34:58,708 INFO - [Curator-Framework-0-EventThread:] ~ <== AtlasTypeDefStoreInitializer(/usr/hdp/3.0.0.0-1365/atlas/models/1000-Hadoop)
(AtlasTypeDefStoreInitializer:196)
2018-05-21 20:34:58,708 INFO - [Curator-Framework-0-EventThread:] ~ ==> AtlasTypeDefStoreInitializer(/usr/hdp/3.0.0.0-1365/atlas/models)
(AtlasTypeDefStoreInitializer:153)
2018-05-21 20:34:58,708 INFO - [Curator-Framework-0-EventThread:] ~ Type patches directory
/usr/hdp/3.0.0.0-1365/atlas/models/patches does not exist or not readable or has no patches
(AtlasTypeDefStoreInitializer:392)
2018-05-21 20:34:58,708 INFO - [Curator-Framework-0-EventThread:] ~ <== AtlasTypeDefStoreInitializer(/usr/hdp/3.0.0.0-1365/atlas/models)
(AtlasTypeDefStoreInitializer:196)
2018-05-21 20:34:58,709 INFO - [Curator-Framework-0-EventThread:] ~ <== AtlasTypeDefStoreInitializer.loadBootstrapTypeDefs()
(AtlasTypeDefStoreInitializer:145)
2018-05-21 20:34:58,709 INFO - [Curator-Framework-0-EventThread:] ~ <== AtlasTypeDefStoreInitializer.instanceIsActive()
(AtlasTypeDefStoreInitializer:341)
2018-05-21 20:34:58,709 INFO - [Curator-Framework-0-EventThread:] ~ Reacting to active state:
initializing Kafka consumers (NotificationHookConsumer:214)
2018-05-21 20:34:58,709 INFO - [Curator-Framework-0-EventThread:] ~ ==> KafkaNotification.createConsumers(notificationType=HOOK,
numConsumers=1, autoCommitEnabled=false) (KafkaNotification:149)
2018-05-21 20:34:58,709 INFO - [Curator-Framework-0-EventThread:] ~ <== KafkaNotification.createConsumers(notificationType=HOOK,
numConsumers=1, autoCommitEnabled=false) (KafkaNotification:156)
2018-05-21 20:34:58,710 INFO - [NotificationHookConsumer thread-0:] ~ [atlas-hook-consumer-thread]:
Starting (NotificationHookConsumer$HookConsumer:72)
2018-05-21 20:34:58,710 INFO - [NotificationHookConsumer thread-0:] ~ ==> HookConsumer
doWork() (NotificationHookConsumer:313)
2018-05-21 20:34:58,710 INFO - [NotificationHookConsumer thread-0:] ~ Atlas Server is not
ready. Waiting for 1000 milliseconds to retry... (NotificationHookConsumer:588)
2018-05-21 20:34:58,713 WARN - [Curator-Framework-0-EventThread:] ~ Instance is active from
BECOMING_ACTIVE (ServiceState:87)
2018-05-21 20:34:59,710 INFO - [NotificationHookConsumer thread-0:] ~ Atlas Server is ready,
can start reading Kafka events. (NotificationHookConsumer:603)
{noformat}

* Below snapshot confirms that, hive hook messages ingested into ATLAS_HOOk topic but not
consumed by Atlas
{noformat}
/usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --zookeeper ctr-e138-1518143905142-323545-01-000005.hwx.site:2181,ctr-e138-1518143905142-323545-01-000006.hwx.site:2181,ctr-e138-1518143905142-323545-01-000007.hwx.site:2181
--topic ATLAS_HOOK --security-protocol PLAINTEXTSASL
Using the ConsoleConsumer with old consumer is deprecated and will be removed in a future
major release. Consider using the new consumer by passing [bootstrap-server] instead of [zookeeper].


{"version":\{"version":"1.0.0","versionParts":[1]},"msgCompressionKind":"NONE","msgSplitIdx":1,"msgSplitCount":1,"msgSourceIP":"172.27.27.0","msgCreatedBy":"hive","msgCreationTime":1527012364380,"message":\{"type":"ENTITY_CREATE_V2","user":"hrt_qa","entities":{"referredEntities":{},"entities":[\{"typeName":"hive_db","attributes":{"owner":"hrt_qa","ownerType":"USER","qualifiedName":"db10@cl1","clusterName":"cl1","name":"db10","description":null,"location":"hdfs://ns1/apps/hive/warehouse/db10.db","parameters":{}},"guid":"-4487807398104856","version":0}]}}}
^CProcessed a total of 1 messages
{noformat}


Live cluster: https://172.27.17.64:8443/#/main/services/ATLAS/summary
Active atlas: https://ctr-e138-1518143905142-323545-01-000003.hwx.site:21443/
NOTE: Currently, above atlas service is in buggy state, restarting atlas will bring atlas
to normal mode. Please do not restart atlas service.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message