atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Mestry (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ATLAS-1818) Performance of Basic Search that Uses indexQuery Takes Long Time to Fetch Results
Date Fri, 19 May 2017 23:16:04 GMT

    [ https://issues.apache.org/jira/browse/ATLAS-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16018152#comment-16018152
] 

Ashutosh Mestry commented on ATLAS-1818:
----------------------------------------

[~yhemanth] The current implementation does not use faceted search features. There is another
feature that is in progress, that may potentially use the feature. We are in the initial stages
of discussion. No approach has been finalized yet.

> Performance of Basic Search that Uses indexQuery Takes Long Time to Fetch Results
> ---------------------------------------------------------------------------------
>
>                 Key: ATLAS-1818
>                 URL: https://issues.apache.org/jira/browse/ATLAS-1818
>             Project: Atlas
>          Issue Type: Bug
>          Components:  atlas-core, atlas-webui
>    Affects Versions: trunk, 0.8-incubating
>            Reporter: Ashutosh Mestry
>            Assignee: Ashutosh Mestry
>             Fix For: 0.9-incubating, 0.8.1-incubating
>
>         Attachments: ATLAS-1818-4.patch
>
>   Original Estimate: 120h
>          Time Spent: 96h
>  Remaining Estimate: 24h
>
> h3. Background
> An environment that is setup with 100K hive_tables each with 84 columns.
> The basic search with query parameter specified is executed. Results take 75 secs to
appear.
> h3. Analysis & Findings
> Similar test was performed with smaller data set (200 hive_tables each with 81 columns)
resulted in less than ideal performance.
> Atlas Basic Search API uses _graph.indexQuery_ for performing search. This uses _Solr_
for doing the search.
> There are 2 aspects that affect performance:
> * Solr's default for returning max query set when no limit is specified is 100K. In the
test scenario, this is returning entire result set.
> * Once result set is returned, _EntityDiscoveryService.searchUsingBasicQuery_ does a
sequential scan to filter data relevant to the query. This operation is proportional to size
of the result set. 
> h3. Solution
> Following changes will improve performance:
> * Solr's max result set property is governed by _atlas.graph.index.search.max-result-set-size_.
It will make sense to set this to a lower number.
> * Modify Solr's configuration _solrconfig.xml_ to use _FastLRUCache_.
> * Modify _EntityDiscoveryService.searchUsingBasicQuery_ to form a query that takes additional
parameters.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message