Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 1A927200C09 for ; Wed, 25 Jan 2017 19:05:47 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 19291160B4E; Wed, 25 Jan 2017 18:05:47 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 63976160B3D for ; Wed, 25 Jan 2017 19:05:46 +0100 (CET) Received: (qmail 13549 invoked by uid 500); 25 Jan 2017 18:05:45 -0000 Mailing-List: contact dev-help@atlas.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@atlas.incubator.apache.org Delivered-To: mailing list dev@atlas.incubator.apache.org Received: (qmail 13538 invoked by uid 99); 25 Jan 2017 18:05:45 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Jan 2017 18:05:45 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id E614A1A0787 for ; Wed, 25 Jan 2017 18:05:44 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -5.019 X-Spam-Level: X-Spam-Status: No, score=-5.019 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=2, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id ffbn4gGcCupK for ; Wed, 25 Jan 2017 18:05:43 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with SMTP id 2CCD35F2F1 for ; Wed, 25 Jan 2017 18:05:42 +0000 (UTC) Received: (qmail 13241 invoked by uid 99); 25 Jan 2017 18:05:27 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Jan 2017 18:05:27 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id 3BA05317691; Wed, 25 Jan 2017 18:05:26 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============1611468787760780794==" MIME-Version: 1.0 Subject: Re: Review Request 55813: Porting performance and stability changes made in 0.7 branch into master From: Madhan Neethiraj To: Madhan Neethiraj Cc: Sarath Subramanian , atlas Date: Wed, 25 Jan 2017 18:05:26 -0000 Message-ID: <20170125180526.13409.75134@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: Madhan Neethiraj X-ReviewGroup: atlas X-Auto-Response-Suppress: DR, RN, OOF, AutoReply X-ReviewRequest-URL: https://reviews.apache.org/r/55813/ X-Sender: Madhan Neethiraj X-ReviewBoard-ShipIt: 1 References: <20170125093233.13409.38171@reviews.apache.org> In-Reply-To: <20170125093233.13409.38171@reviews.apache.org> Reply-To: Madhan Neethiraj X-ReviewRequest-Repository: atlas archived-at: Wed, 25 Jan 2017 18:05:47 -0000 --===============1611468787760780794== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/55813/#review162981 ----------------------------------------------------------- Ship it! The fix looks good! With this patch, following DSL query returns in about 300ms, compared to about 50 seconds earlier! On a store having ~70,000 hive_columns hive_column where qualifiedName='default.testtable_772.col507@cl1' - Madhan Neethiraj On Jan. 25, 2017, 9:32 a.m., Sarath Subramanian wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/55813/ > ----------------------------------------------------------- > > (Updated Jan. 25, 2017, 9:32 a.m.) > > > Review request for atlas, Madhan Neethiraj and Suma Shivaprasad. > > > Bugs: ATLAS-1403 > https://issues.apache.org/jira/browse/ATLAS-1403 > > > Repository: atlas > > > Description > ------- > > Currently DSL uses a fill function during Gremlin Translation to merge results by typeName and superTypeName and fill function loads the resulting vertices in memory. This causes significant memory usage and ATLAS server spends lot of time doing GC instead of useful work resulting in OOO sometimes ( when GC is not able to recover and search queries are run in parallel) > The proposal is to replace this with typeName checks along by finding all the subtypes for a given type and using an IN clause in the filter. > For eg: > Query = Person where (birthday < "1950-01-01T02:35:58.440Z") limit 40 offset 0 > Optimized query > Gremlin Query = L: > {g.V.has("__typeName", T.in, ['Person','Manager']).and(_().has("Person.birthday", T.lt, -631142641560)) [0..<40].toList()} > > > Diffs > ----- > > repository/src/main/java/org/apache/atlas/discovery/DataSetLineageService.java fd5dba7 > repository/src/main/java/org/apache/atlas/discovery/graph/DefaultGraphPersistenceStrategy.java 266f27c > repository/src/main/java/org/apache/atlas/discovery/graph/GraphBackedDiscoveryService.java b637f90 > repository/src/main/java/org/apache/atlas/gremlin/Gremlin2ExpressionFactory.java 41dc65f > repository/src/main/java/org/apache/atlas/gremlin/GremlinExpressionFactory.java 3677544 > repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 889236c > repository/src/main/scala/org/apache/atlas/query/ClosureQuery.scala daef582 > repository/src/main/scala/org/apache/atlas/query/GraphPersistenceStrategies.scala a9dcdff > repository/src/main/scala/org/apache/atlas/query/GremlinEvaluator.scala ade4176 > repository/src/main/scala/org/apache/atlas/query/GremlinQuery.scala a61ff98 > repository/src/test/java/org/apache/atlas/discovery/DataSetLineageServiceTest.java a0ee26c > repository/src/test/scala/org/apache/atlas/query/GremlinTest2.scala 33513c5 > > Diff: https://reviews.apache.org/r/55813/diff/ > > > Testing > ------- > > Ran all Unit Tests and was successful. > Ran search query on hive_column with 100,000 entities, performance improved from 45sec to 0.5sec > > > Thanks, > > Sarath Subramanian > > --===============1611468787760780794==--