atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Madhan Neethiraj <mad...@apache.org>
Subject Re: Review Request 64141: [ATLAS-2287]: Include lucene libraries when building atlas distribution with Janus profile
Date Thu, 30 Nov 2017 19:03:42 GMT


> On Nov. 30, 2017, 2:47 p.m., David Radley wrote:
> > pom.xml
> > Line 713 (original), 716 (patched)
> > <https://reviews.apache.org/r/64141/diff/1/?file=1903423#file1903423line716>
> >
> >     I could start the UI but had lots of Zookeeper exceptions and posts to create
entities did not work.
> >     
> >     I got errors like this :
> >     2017-11-30 12:16:28,975 INFO  - [main:] ~ GraphTransaction intercept for org.apache.atlas.repository.store.graph.v1.AtlasEntityStoreV1.addClassifications
(GraphTransactionAdvisor$1:41)
> >     2017-11-30 12:16:28,975 INFO  - [main:] ~ GraphTransaction intercept for org.apache.atlas.repository.store.graph.v1.AtlasEntityStoreV1.updateClassifications
(GraphTransactionAdvisor$1:41)
> >     2017-11-30 12:16:28,976 INFO  - [main:] ~ GraphTransaction intercept for org.apache.atlas.repository.store.graph.v1.AtlasEntityStoreV1.addClassification
(GraphTransactionAdvisor$1:41)
> >     2017-11-30 12:16:28,976 INFO  - [main:] ~ GraphTransaction intercept for org.apache.atlas.repository.store.graph.v1.AtlasEntityStoreV1.deleteClassifications
(GraphTransactionAdvisor$1:41)
> >     2017-11-30 12:16:28,976 INFO  - [main:] ~ GraphTransaction intercept for org.apache.atlas.repository.store.graph.v1.AtlasEntityStoreV1.getClassification
(GraphTransactionAdvisor$1:41)
> >     2017-11-30 12:16:29,021 WARN  - [main-SendThread(localhost:9026):] ~ Session
0x0 for server null, unexpected error, closing socket connection and attempting reconnect
(ClientCnxn$SendThread:1102)
> >     java.net.ConnectException: Connection refused
> >             at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> >             at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> >             at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
> >             at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
> >     2017-11-30 12:16:29,029 INFO  - [main:] ~ Starting service org.apache.atlas.web.service.ActiveInstanceElectorService
(Services:53)
> >     2017-11-30 12:16:29,030 INFO  - [main:] ~ HA is not enabled, no need to start
leader election service (ActiveInstanceElectorService:96)
> >     2017-11-30 12:16:29,030 INFO  - [main:] ~ Starting service org.apache.atlas.kafka.KafkaNotification
(Services:53)
> >     2017-11
> >     
> >     
> >     and 
> >     017-11-30 12:16:31,194 INFO  - [main:] ~ Adding cross-site request forgery (CSRF)
protection (AtlasCSRFPreventionFilter:98)
> >     2017-11-30 12:16:31,646 INFO  - [main:] ~ AuditFilter initialization started
(AuditFilter:57)
> >     2017-11-30 12:30:47,004 WARN  - [NIOServerCxn.Factory:localhost/127.0.0.1:9026:]
~ caught end of stream exception (NIOServerCnxn:357)
> >     EndOfStreamException: Unable to read additional data from client sessionid 0x1600cdb55840000,
likely client has closed socket
> >             at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> >             at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
> >             at java.lang.Thread.run(Thread.java:748)
> >     2017-11-30 12:30:47,109 WARN  - [zkCallback-3-thread-2:] ~ Watcher org.apache.solr.common.cloud.ConnectionManager@5232c71
name: ZooKeeperConnection Watcher:localhost:2181 got event WatchedEvent state:Disconnected
type:None path:null path: null type: None (ConnectionManager:108)
> >     2017-11-30 12:30:47,111 WARN  - [zkCallback-3-thread-2:] ~ zkClient has disconnected
(ConnectionManager:184)
> >     2017-11-30 12:30:48,060 WARN  - [NIOServerCxn.Factory:localhost/127.0.0.1:9026:]
~ caught end of stream exception (NIOServerCnxn:357)
> >     EndOfStreamException: Unable to read additional data from client sessionid 0x1600cdb55840001,
likely client has closed socket
> >             at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> >             at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
> >             at java.lang.Thread.run(Thread.java:748)
> >     2017-11-30 12:30:48,255 WARN  - [SyncThread:0:] ~ fsync-ing the write ahead
log in SyncThread:0 took 1246ms which will adversely effect operation latency. See the ZooKeeper
troubleshooting guide (FileTxnLog:334)
> >     2017-11-30 12:30:48,508 ERROR - [ZkClient-EventThread-1116-localhost:9026:]
~ Controller 1 epoch 2 initiated state change for partition [__consumer_offsets,19] from OfflinePartition
to OnlinePartition failed (Logging$class:103)
> >     kafka.common.NoReplicaOnlineException: No replica for partition [__consumer_offsets,19]
is alive. Live brokers are: [Set()], Assigned replicas are: [List(1)]
> >             at kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75)
> >             at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:345)
> >             at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:205)
> >             at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
> >             at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
> >             at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
> >             at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
> >             at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
> >             at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
> >             at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
> >             at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
> >             at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
> >             at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)
> >             at kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:70)
> >             at kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:335)
> >             at kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:166)
> >             at kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:84)
> >             at kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply$mcZ$sp(KafkaController.scala:1175)
> >             at kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply(KafkaController.scala:1173)
> >             at kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply(KafkaController.scala:1173)
> >             at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231)
> >             at kafka.controller.KafkaController$SessionExpirationListener.handleNewSession(KafkaController.scala:1173)
> >             at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:735)
> >             at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
> >     2017-11-30 12:30:48,512 ERROR - [ZkClient-EventThread-1116-localhost:9026:]
~ Controller 1 epoch 2 initiated state change for partition [__consumer_offsets,30] from OfflinePartition
to OnlinePartition failed (Logging$class:103)
> >     kafka.common.NoReplicaOnlineException: No replica for partition [__consumer_offsets,30]
is alive. Live brokers are: [Set()], Assigned replicas are: [List(1)]
> >             at kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75)
> >             at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:345)
> >             at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:205)
> >             at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
> >             at kafka.controll
> 
> Madhan Neethiraj wrote:
>     2017-11-30 12:30:48,060 WARN  - [NIOServerCxn.Factory:localhost/127.0.0.1:9026:]
~ caught end of stream exception (NIOServerCnxn:357)
>     EndOfStreamException: Unable to read additional data from client sessionid 0x1600cdb55840001,
likely client has closed socket
>             at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
>             at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
>             at java.lang.Thread.run(Thread.java:748)
>             
>     This error was caused by incorrect port number configuration in conf/atlas-application.properties:
>       atlas.kafka.zookeeper.connect=localhost:9026
>     
>     The fix is to replace 9026 with 2181 (i.e. use the zookeeper in embedded-hbase).
>     
>     However, after this change startup of org.apache.atlas.kafka.KafkaNotification service
seems to hang. I will look into this further.

Use of port 9026 is indeed correct. No need to update the configuration.

WARN messagess about "connection refused" are specific to stand-alone dev-env deployment where
Atlas uses an embedded Kafka & Zookeeper. Kafka & Zookeeper are started towards the
end of initilization but before this is done, an attempt is made to connect to the Zookeeper
- resulting in this WARN. This WARNing will be gone later after embedded Kafka and Zookeeper
are started. Need to investigate from where the connection attempt is being made. But this
issue shouldn't block Atlas being functional.

Another error, "Unable to read additional data from client sessionid 0x1600cdb55840001, likely
client has closed socket" - I think this might be due to low connect/session timeout values
in conf/atlas-application.properties:
  atlas.kafka.zookeeper.session.timeout.ms=400
  atlas.kafka.zookeeper.connection.timeout.ms=200

Can you increase both to 60000 and try again?


- Madhan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64141/#review192267
-----------------------------------------------------------


On Nov. 29, 2017, 2:32 a.m., Sarath Subramanian wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64141/
> -----------------------------------------------------------
> 
> (Updated Nov. 29, 2017, 2:32 a.m.)
> 
> 
> Review request for atlas, Apoorv Naik, Ashutosh Mestry, and Madhan Neethiraj.
> 
> 
> Bugs: ATLAS-2287
>     https://issues.apache.org/jira/browse/ATLAS-2287
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> When Atlas is build using -Pdist profile, lucene jars are excluded during packaging of
the war file. Since we are not shading graphdb module for janus profile, these jars are needed
as run time dependency.
> Titan's shaded jar includes the lucene libraries and hence were excluded during packaging
of war to avoid duplicate dependencies.
> 
> 
> Diffs
> -----
> 
>   distro/pom.xml eea256d8 
>   pom.xml 3720c1f5 
>   webapp/pom.xml b4a96d36 
> 
> 
> Diff: https://reviews.apache.org/r/64141/diff/1/
> 
> 
> Testing
> -------
> 
> validated building atlas distribution using both janus and titan0 profile. Atlas starts
fine and basic functionalities working.
> 
> mvn clean install -DskipTests -Pdist,embedded-hbase-solr
> mvn clean install -DskipTests -Pdist,embedded-hbase-solr -DGRAPH-PROVIDER=titan0
> 
> 
> Thanks,
> 
> Sarath Subramanian
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message