atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Mestry (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (ATLAS-1818) Performance of Basic Search that Uses indexQuery Takes Long Time to Fetch Results
Date Fri, 19 May 2017 18:24:04 GMT

     [ https://issues.apache.org/jira/browse/ATLAS-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ashutosh Mestry updated ATLAS-1818:
-----------------------------------
    Attachment: ATLAS-1818-4.patch

> Performance of Basic Search that Uses indexQuery Takes Long Time to Fetch Results
> ---------------------------------------------------------------------------------
>
>                 Key: ATLAS-1818
>                 URL: https://issues.apache.org/jira/browse/ATLAS-1818
>             Project: Atlas
>          Issue Type: Bug
>          Components:  atlas-core, atlas-webui
>    Affects Versions: trunk, 0.8-incubating
>            Reporter: Ashutosh Mestry
>            Assignee: Ashutosh Mestry
>             Fix For: trunk, 0.8-incubating
>
>         Attachments: ATLAS-1818-4.patch
>
>   Original Estimate: 120h
>          Time Spent: 96h
>  Remaining Estimate: 24h
>
> h3. Background
> An environment that is setup with 100K hive_tables each with 84 columns.
> The basic search with query parameter specified is executed. Results take 75 secs to
appear.
> h3. Analysis & Findings
> Similar test was performed with smaller data set (200 hive_tables each with 81 columns)
resulted in less than ideal performance.
> Atlas Basic Search API uses _graph.indexQuery_ for performing search. This uses _Solr_
for doing the search.
> There are 2 aspects that affect performance:
> * Solr's default for returning max query set when no limit is specified is 100K. In the
test scenario, this is returning entire result set.
> * Once result set is returned, _EntityDiscoveryService.searchUsingBasicQuery_ does a
sequential scan to filter data relevant to the query. This operation is proportional to size
of the result set. 
> h3. Solution
> Following changes will improve performance:
> * Solr's max result set property is governed by _atlas.graph.index.search.max-result-set-size_.
It will make sense to set this to a lower number.
> * Modify Solr's configuration _solrconfig.xml_ to use _FastLRUCache_.
> * Modify _EntityDiscoveryService.searchUsingBasicQuery_ to form a query that takes additional
parameters.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message