atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christian R (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ATLAS-1868) Highly inefficient DSL-queries
Date Wed, 14 Jun 2017 14:04:00 GMT

    [ https://issues.apache.org/jira/browse/ATLAS-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16049209#comment-16049209
] 

Christian R commented on ATLAS-1868:
------------------------------------

Hi Graham, 

thank you for looking into this and I am glad you are able to reproduce it. 

The only think I have thought of so far is to leverage the solr/elastic index to see if we
can detect parts of the graph that have very few entries and base the query around that. I
suspect not, but it was my first thought (i used to work with search, you see...)

(I want to sneak in a question here that I couldn't find any discussion on; was it a deliberate
choice to remove the titan id from gremlin search results in v 0.8 and above? I get out/inVertex
in the edge results, but the vertices no longer contain 'id'. At least, not when running on
berkeley/elastic locally. I am trying to verify on hdp sandbox 2.6 now. Given my perf issues
with DSL we are using gremlin quite heavily on v 0.7 now. )

> Highly inefficient DSL-queries
> ------------------------------
>
>                 Key: ATLAS-1868
>                 URL: https://issues.apache.org/jira/browse/ATLAS-1868
>             Project: Atlas
>          Issue Type: Bug
>          Components:  atlas-core
>    Affects Versions: 0.7-incubating
>         Environment: linux, hbase + solr configuration.
>            Reporter: Christian R
>              Labels: dsl, gremlin
>
> The DSL query 'mytype where property.id = "id1"' appears to be rewritten as a gremlin
query that resembles:
> g.V.has(typename, 'mytype'ยจ).as(x).out('property').has('id', 'id1').back('x')
> On our system this query takes 6-7 minutes. The query
> g.V.has('id', 'id1').in('property').has('typename', 'mytype')
> takes 350 milliseconds.
> Our graph:
> g.V.count() = 1359151
> We have atlas 0.7 installed. I've compiled the latest 0.9 code and looked at the generated
gremlin query as reported in the logs for the same DSL-query, and I think 0.9 has the same
performance issues. Unfortunately I don't have a big graph on a 0.9 installation to test performance.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message