atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Graham Wallis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ATLAS-1868) Highly inefficient DSL-queries
Date Wed, 14 Jun 2017 15:42:00 GMT

    [ https://issues.apache.org/jira/browse/ATLAS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16049305#comment-16049305
] 

Graham Wallis commented on ATLAS-1868:
--------------------------------------

Hi Christian

I'm afraid I don't know the answer to the question about 0.8 in your last para. I have only
been involved with Atlas for a few weeks and am still trying to find my way around :-)  I
will post back here if I find anything that might explain it.

Regarding your suggestion on solr/elastic indexs, I think they are only used for full-text
searches whereas I'm thinking that to optimize your original query, we should try to exploit
a composite index and start the traversal from the indexed vertex. I imagine that a rarity
score from the full-text index might be an alternative that would indicate the better of the
two directions for traversing. I'm currently rummaging through the various layers of graph
abstraction to work out where in Atlas either such an optimization would be implemented. Up
till now I have been too close to the graph - i.e. by that stage the operations are lower
level discrete lookups of vertices or edges. I think I need to focus further up and find where
the whole query is composed.





> Highly inefficient DSL-queries
> ------------------------------
>
>                 Key: ATLAS-1868
>                 URL: https://issues.apache.org/jira/browse/ATLAS-1868
>             Project: Atlas
>          Issue Type: Bug
>          Components:  atlas-core
>    Affects Versions: 0.7-incubating
>         Environment: linux, hbase + solr configuration.
>            Reporter: Christian R
>              Labels: dsl, gremlin
>
> The DSL query 'mytype where property.id = "id1"' appears to be rewritten as a gremlin
query that resembles:
> g.V.has(typename, 'mytype'ยจ).as(x).out('property').has('id', 'id1').back('x')
> On our system this query takes 6-7 minutes. The query
> g.V.has('id', 'id1').in('property').has('typename', 'mytype')
> takes 350 milliseconds.
> Our graph:
> g.V.count() = 1359151
> We have atlas 0.7 installed. I've compiled the latest 0.9 code and looked at the generated
gremlin query as reported in the logs for the same DSL-query, and I think 0.9 has the same
performance issues. Unfortunately I don't have a big graph on a 0.9 installation to test performance.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message