lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: [jira] Commented: (SOLR-1311) pseudo-field-collapsing
Date Sat, 16 Oct 2010 21:31:56 GMT
The "Field Collapsing" patch is dead. "Search Grouping" is a different
suite of techniques that the committers are willing to commit. Note
that the Field Collapsing issue has been open for 3+ years and nothing
was ever committed: the Solr committers who care all hate it.

8G is not a big index. 450G is a big index. 1.5 billion docs is a big
index. The greybeards won't touch a structural change that doesn't
work for the wide range of use cases. The Field Collapsing patches
never scaled.

On Fri, Oct 15, 2010 at 5:42 AM, Marc Sturlese (JIRA) <jira@apache.org> wrote:
>
>    [ https://issues.apache.org/jira/browse/SOLR-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921328#action_12921328
]
>
> Marc Sturlese commented on SOLR-1311:
> -------------------------------------
>
> Well I said it can not be integrated as a plugin because it hacks DocListAndSetNC and
DocListNC. This 2 functions just can be altered altering the SolrIndexSearcher.java class.
> The pseudo-field-collapse sort is not included in the current field collapsing but current
field collapsing seems to perform much better that it use to (I don't think as good as this
patch, but the current feature is much more complete than my patch).
> I supose I can close it.
>
>> pseudo-field-collapsing
>> -----------------------
>>
>>                 Key: SOLR-1311
>>                 URL: https://issues.apache.org/jira/browse/SOLR-1311
>>             Project: Solr
>>          Issue Type: New Feature
>>          Components: search
>>    Affects Versions: 1.4
>>            Reporter: Marc Sturlese
>>             Fix For: Next
>>
>>         Attachments: SOLR-1311-pseudo-field-collapsing.patch
>>
>>
>> I am trying to develope a new way of doing field collapsing based on the adjacent
field collapsing algorithm. I have started developing it beacuse I am experiencing performance
problems with the field collapsing patch with big index (8G).
>> The algorith does adjacent-pseudo-field collapsing. It does collapsing on the first
X documents. Instead of making the collapsed docs disapear, the algorith will send them to
a given position of the relevance results list.
>> The reason I just do collapsing in the first X documents is that if I have for example
600000 results and I am showing 10 results per page, I really don't need to do collapsing
in the page 30000 or even not in the 3000. Doing this I am noticing dramatically better performance.
The problem is I couldn't find a way to plug the algorithm as a component and keep good performance.
I had to hack few classes in SolrIndexSearcher.java
>> This patch is just experimental and for testing purposes. In case someone finds it
interesting would be good do find a way to integrate it in a better way than it is at the
moment.
>> Advices are more than welcome.
>>
>> Functionality:
>> In solrconfig.xml we specify the pseudo-collapsing parameters:
>>      <str name="plus.considerMoreDocs">true</str>
>>      <str name="plus.considerHowMany">3000</str>
>>      <str name="plus.considerField">name</str>
>> (at the moment there's no threshold and other parameters that exist in the current
collapse-field patch)
>> plus.considerMoreDocs one enables pseudo-collapsing
>> plus.considerHowMany sets the number of resultant documents in wich we want to apply
the algorithm
>> plus.considerField is the field to do pseudo-collapsing
>> If the number of results is lower than plus.considerHowMany the algorithm will be
applyed to all the results.
>> Let's say there is a query with 600000 results and we've set considerHowMany to 3000
(and we already have the docs sorted by relevance).
>> What adjacent-pseudo-collapse does is, if the 2nd doc has to be collapsed it will
be sent to the pos 2999 of the relevance results array. If the 3th has to be collpased too
 will go to the position 2998 and successively like this.
>> The algorithm is not applyed when a sortspec is set or plus.considerMoreDocs is set
to false. It neighter is applyed when using MoreLikeThisRequestHanlder.
>> Example with a query of 9 results:
>> Results sorted by relevance without pseudo-collapse-algorithm:
>> doc1 - collapse_field_value 3
>> doc2 - collapse_field_value 3
>> doc3 - collapse_field_value 4
>> doc4 - collapse_field_value 7
>> doc5 - collapse_field_value 6
>> doc6 - collapse_field_value 6
>> doc7 - collapse_field_value 5
>> doc8 - collapse_field_value 1
>> doc9 - collapse_field_value 2
>> Results pseudo-collapsed with plus.considerHowMany = 5
>> doc1 - collapse_field_value 3
>> doc3 - collapse_field_value 4
>> doc4 - collapse_field_value 7
>> doc5 - collapse_field_value 6
>> doc2 - collapse_field_value 3*
>> doc6 - collapse_field_value 6
>> doc7 - collapse_field_value 5
>> doc8 - collapse_field_value 1
>> doc9 - collapse_field_value 2
>> Results pseudo-collapsed with plus.considerHowMany = 9
>> doc1 - collapse_field_value 3
>> doc3 - collapse_field_value 4
>> doc4 - collapse_field_value 7
>> doc5 - collapse_field_value 6
>> doc7 - collapse_field_value 5
>> doc8 - collapse_field_value 1
>> doc9 - collapse_field_value 2
>> doc6 - collapse_field_value 6*
>> doc2 - collapse_field_value 3*
>> *pseudo-collapsed documents
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>



-- 
Lance Norskog
goksron@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message