lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew Willson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-2482) Index sorter
Date Fri, 09 Nov 2012 18:32:13 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494187#comment-13494187
] 

Matthew Willson commented on LUCENE-2482:
-----------------------------------------

Hi all -- few quick questions if anyone is still watching this.

* Could this be used to achieve an impact ordered index, as in e.g. [1], where documents in
a given term's postings list are ordered by score contribution or term frequency?

* Any caveats or things one should be aware of when it comes to index sorting in combination
with different index merge strategies, and some of the more advanced stuff in Solr for managing
distributed indexes?

* Anyone aware of any other work along the lines of early stopping / dynamic pruning optimisations
in Lucene? e.g. MaxScore from [1] (I think Xapian [2] calls it 'operator decay') or accumulator
pruning based algorithms from [1] (perhaps in combination with impact ordering)? in particular
is there anything in Lucene 4's approach to scoring and indexing which would make these hard
in principle?

Any pointers gratefully received.

[1] Buettcher Clarke & Cormack "Implementing and Evaluating search engines" ch. 5 pp.
143-153
[2] http://xapian.org/docs/matcherdesign.html
                
> Index sorter
> ------------
>
>                 Key: LUCENE-2482
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2482
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/other
>    Affects Versions: 3.1, 4.0-ALPHA
>            Reporter: Andrzej Bialecki 
>            Assignee: Andrzej Bialecki 
>             Fix For: 3.6
>
>         Attachments: indexSorter.patch, LUCENE-2482-4.0.patch
>
>
> A tool to sort index according to a float document weight. Documents with high weight
are given low document numbers, which means that they will be first evaluated. When using
a strategy of "early termination" of queries (see TimeLimitedCollector) such sorting significantly
improves the quality of partial results.
> (Originally this tool was created by Doug Cutting in Nutch, and used norms as document
weights - thus the ordering was limited by the limited resolution of norms. This is a pure
Lucene version of the tool, and it uses arbitrary floats from a specified stored field).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message