lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stanislav Livotov (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-12688) LTR Multiple performance fixes + pure DocValues support for FieldValueFeature
Date Tue, 21 Aug 2018 21:53:00 GMT

     [ https://issues.apache.org/jira/browse/SOLR-12688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Stanislav Livotov updated SOLR-12688:
-------------------------------------
    Attachment: NoFQSolrFeatureOptimisation.patch
                LTRScoringModelHashCodeCaching.patch
                DocValuesSupportForFieldValueFeature.patch

> LTR Multiple performance fixes + pure DocValues support for FieldValueFeature
> -----------------------------------------------------------------------------
>
>                 Key: SOLR-12688
>                 URL: https://issues.apache.org/jira/browse/SOLR-12688
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: contrib - LTR
>            Reporter: Stanislav Livotov
>            Priority: Major
>         Attachments: DocValuesSupportForFieldValueFeature.patch, LTRModelHashCodeAfter.png,
LTRModelHashCodeBefore.png, LTRScoringModelHashCodeCaching.patch, LTRSolrFeatureAfter.png,
LTRSolrFeatureBefore.png, LTRwithDVOptimisation.png, LTRwithoutDVOptimisation.png, MultiplePerformanceFixes.patch,
NoFQSolrFeatureOptimisation.patch
>
>
> This ticket is related to 2 performance and 1 functional/performance issue that I had
found during integrating LTR in our e-commerce search engine : 
>  # FieldValueFeature doesn't support pure DocValues fields (Stored false). Please also
note that for fields which are both stored and DocValues it is working not optimal because
it is extracting just one field from the stored document. DocValues are obviously faster for
such usecases. Below are screenshots of JFR profiles without and with new support of DocValues
for the case when it can be read from DocValues. 
>  !LTRwithoutDVOptimisation.png! 
>  !LTRwithDVOptimisation.png!
>  # SolrFeature was not optimally implemented for the case when no fq parameter was
passed. I'm not absolutely sure what was the intention to introduce both q(which is supposed
to be a function query) and fq parameter for the same SolrFeature at all(Is there a case
when they will be used together ? ), so I decided not to change behavior but just optimize
described case !LTRSolrFeatureBefore.png! !LTRSolrFeatureAfter.png!
>  # LTRScoringModel was a mutable object. It was leading to the calculation of hashcode
on each query, which in turn can consume a lot of time in cases when a model is big(In our
case we were using LambdaMART with 100 trees and leaves which was consuming 3MB of the disk
space). So I decided to make LTRScoringModel immutable and cache hashCode calculation. Below
are the screenshots before and after.  !LTRModelHashCodeBefore.png!!LTRModelHashCodeAfter.png!
> In our case, we had a feature.json file with 8 FieldValueFeatures, 5 SolrFeatures and
1 OriginalScoreFeature. 
> Before introducing the optimizations performance overhead for LTR reranking of top 48
documents was 300ms. With all the optimizations in it was decreased to 35ms. 
> Please also note that JFR screenshots were captured on Solr 6.6 codebase. All the numbers
are also taken from Solr version 6.6. 
> I hope that changes of the DocValues interface(method get() was removed and advanceExact
was added) won't affect it (At least for DenseNumericDocValues it will work as expected.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message