atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Madhan Neethiraj <mad...@apache.org>
Subject Re: Review Request 55813: Porting performance and stability changes made in 0.7 branch into master
Date Wed, 25 Jan 2017 18:05:26 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55813/#review162981
-----------------------------------------------------------


Ship it!




The fix looks good!

With this patch, following DSL query returns in about 300ms, compared to about 50 seconds
earlier! On a store having ~70,000 hive_columns

  hive_column where qualifiedName='default.testtable_772.col507@cl1'

- Madhan Neethiraj


On Jan. 25, 2017, 9:32 a.m., Sarath Subramanian wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55813/
> -----------------------------------------------------------
> 
> (Updated Jan. 25, 2017, 9:32 a.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj and Suma Shivaprasad.
> 
> 
> Bugs: ATLAS-1403
>     https://issues.apache.org/jira/browse/ATLAS-1403
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> Currently DSL uses a fill function during Gremlin Translation to merge results by typeName
and superTypeName and fill function loads the resulting vertices in memory. This causes significant
memory usage and ATLAS server spends lot of time doing GC instead of useful work resulting
in OOO sometimes ( when GC is not able to recover and search queries are run in parallel)
> The proposal is to replace this with typeName checks along by finding all the subtypes
for a given type and using an IN clause in the filter.
> For eg:
> Query = Person where (birthday < "1950-01-01T02:35:58.440Z") limit 40 offset 0
> Optimized query
> Gremlin Query = L:
> {g.V.has("__typeName", T.in, ['Person','Manager']).and(_().has("Person.birthday", T.lt,
-631142641560)) [0..<40].toList()}
> 
> 
> Diffs
> -----
> 
>   repository/src/main/java/org/apache/atlas/discovery/DataSetLineageService.java fd5dba7

>   repository/src/main/java/org/apache/atlas/discovery/graph/DefaultGraphPersistenceStrategy.java
266f27c 
>   repository/src/main/java/org/apache/atlas/discovery/graph/GraphBackedDiscoveryService.java
b637f90 
>   repository/src/main/java/org/apache/atlas/gremlin/Gremlin2ExpressionFactory.java 41dc65f

>   repository/src/main/java/org/apache/atlas/gremlin/GremlinExpressionFactory.java 3677544

>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236c

>   repository/src/main/scala/org/apache/atlas/query/ClosureQuery.scala daef582 
>   repository/src/main/scala/org/apache/atlas/query/GraphPersistenceStrategies.scala a9dcdff

>   repository/src/main/scala/org/apache/atlas/query/GremlinEvaluator.scala ade4176 
>   repository/src/main/scala/org/apache/atlas/query/GremlinQuery.scala a61ff98 
>   repository/src/test/java/org/apache/atlas/discovery/DataSetLineageServiceTest.java
a0ee26c 
>   repository/src/test/scala/org/apache/atlas/query/GremlinTest2.scala 33513c5 
> 
> Diff: https://reviews.apache.org/r/55813/diff/
> 
> 
> Testing
> -------
> 
> Ran all Unit Tests and was successful.
> Ran search query on hive_column with 100,000 entities, performance improved from 45sec
to 0.5sec
> 
> 
> Thanks,
> 
> Sarath Subramanian
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message