lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martijn v Groningen <martijn.is.h...@gmail.com>
Subject Re: [jira] Issue Comment Edited: (SOLR-236) Field collapsing
Date Mon, 07 Dec 2009 15:25:59 GMT
Yes it should look similar to that. What is the exact request you send to Solr?
Also to check if the patch works correctly can you run: ant clean test
There are a number of tests that test the Field collapse functionality.

Martijn


2009/12/7 Marc Sturlese <marc.sturlese@gmail.com>:
>
>><lst name="collapse_counts">
>>   <str name="field">cat</str>
>>    <lst name="results">
>>        <lst name="009">
>>            <str name="fieldValue">hard</str>
>>           <int name="collapseCount">1</int>
>>            <result name="collapsedDocs" numFound="1" start="0">
>>                 <doc>
>>                    <long name="id">008</long>
>>                    <str name="content">aaa aaa</str>
>>                    <str name="col">ccc</str>
>>                 </doc>
>>            </result>
>>        </lst>
>>        ...
>>    </lst>
>></lst>
> I see, looks like I am applying the patch wrongly somehow.
> This the complete collapse_counts response I am getting:
> <lst name="collapse_counts">
>  <str name="field">col</str>
>  <lst name="results">
>    <lst>
>      <int name="collapseCount">1</int>
>      <int name="collapseCount">1</int>
>      <int name="collapseCount">1</int>
>      <str name="fieldValue">bbb</str>
>      <str name="fieldValue">ccc</str>
>      <str name="fieldValue">xxx</str>
>      <result name="collapsedDocs" numFound="1" start="0">
>        <doc>
>          <long name="id">2</long>
>          <str name="content">aaa aaa</str>
>          <str name="col">bbb</str>
>        </doc>
>      </result>
>      <result name="collapsedDocs" numFound="1" start="0">
>        <doc>
>          <long name="id">8</long>
>          <str name="content">aaa aaa aaa sd</str>
>          <str name="col">ccc</str>
>       </doc>
>      </result>
>      <result name="collapsedDocs" numFound="4" start="0">
>        <doc>
>          <long name="id">12</long>
>          <str name="content">aaa aaa aaa v</str>
>          <str name="col">xxx</str>
>        </doc>
>      </result>
>    </lst>
>  </lst>
> </lst>
>
> As you can see I am getting a <lst> tag with no name. As I understood what
> you told me. I should be getting as many lst tags as collapsed groups and
> the name attribute of the lst should be the unique field value. So, if the
> patch was applyed correcly teh response should look like:
>
> <lst name="collapse_counts">
>  <str name="field">col</str>
>  <lst name="results">
>    <lst name="354> (the head value of the collapsed group)
>      <int name="collapseCount">1</int>
>      <str name="fieldValue">bbb</str>
>      <result name="collapsedDocs" numFound="1" start="0">
>        <doc>
>          <long name="id">2</long>
>          <str name="content">aaa aaa</str>
>          <str name="col">bbb</str>
>        </doc>
>      </result>
>    </lst>
>    <lst name="654">
>      <int name="collapseCount">1</int>
>      <str name="fieldValue">ccc</str>
>      <result name="collapsedDocs" numFound="1" start="0">
>        <doc>
>          <long name="id">8</long>
>          <str name="content">aaa aaa aaa sd</str>
>          <str name="col">ccc</str>
>       </doc>
>      </result>
>    </lst>
>    <lst name="654">
>      <int name="collapseCount">1</int>
>      <str name="fieldValue">xxx</str>
>      <result name="collapsedDocs" numFound="4" start="0">
>        <doc>
>          <long name="id">12</long>
>          <str name="content">aaa aaa aaa v</str>
>          <str name="col">xxx</str>
>        </doc>
>      </result>
>    </lst>
>  </lst>
> </lst>
>
> Is this the way the response looks like when you use teh patch?
> Thanks in advance
>
>
> Martijn v Groningen wrote:
>>
>> Hi Marc,
>>
>> I'm not sure if I follow you completely, but the example you gave is
>> not complete. I'm missing a few tags in your example. Lets assume the
>> following response that the latest patches produce.
>>
>> <lst name="collapse_counts">
>>     <str name="field">cat</str>
>>     <lst name="results">
>>         <lst name="009">
>>             <str name="fieldValue">hard</str>
>>             <int name="collapseCount">1</int>
>>             <result name="collapsedDocs" numFound="1" start="0">
>>                  <doc>
>>                     <long name="id">008</long>
>>                     <str name="content">aaa aaa</str>
>>                     <str name="col">ccc</str>
>>                  </doc>
>>             </result>
>>         </lst>
>>         ...
>>     </lst>
>> </lst>
>>
>> The result list contains collapse groups. The name of the child
>> elements are the collapse head ids. Everything that falls under the
>> collapse head belongs to that collapse group and thus adding document
>> head id to the field value is unnecessary.  In the above example
>> document with id 009 is the document head of document with id 008.
>> Document with id 009 should be displayed in the search result.
>>
>> From what you have said, it seems that you properly configured the patch.
>>
>> Martijn
>>
>> 2009/12/7 Marc Sturlese <marc.sturlese@gmail.com>:
>>>
>>> Hey there, I have beeb testing the last patch and I think or I am missing
>>> something or the way to show the collapsed documents when adjacent
>>> collapse
>>> can be sometimes confusing:
>>> I am using the patch replacing queryComponent for collapseComponent (not
>>> using both at same time):
>>>  <searchComponent name="query"
>>> class="org.apache.solr.handler.component.CollapseComponent">
>>> What I have noticed is, imagin you get these results in the search:
>>> doc1:
>>>   id:001
>>>   collapseField:ccc
>>> doc2:
>>>   id:002
>>>   collapseField:aaa
>>> doc3:
>>>   id:003
>>>   collapseField:ccc
>>> doc4:
>>>   id:004
>>>   collapseField:bbb
>>>
>>> And in the collapse_counts you get:
>>> <int name="collapseCount">1</int>
>>> <str name="fieldValue">ccc</str>
>>> <result name="collapsedDocs" numFound="1" start="0">
>>> <doc>
>>> <long name="id">008</long>
>>> <str name="content">aaa aaa</str>
>>> <str name="col">ccc</str>
>>> </doc>
>>> </result>
>>>
>>> Now, how can I know the head document of doc 008? Both 001 and 003 could
>>> be... wouldn't make sense to connect in someway  the uniqueField with the
>>> collapsed documents?
>>>
>>> Adding something to collapse_counts like:
>>> <int name="collapseCount">1</int>
>>> <str name="fieldValue">ccc</str>
>>> <str name="uniqueFieldId">003</str>
>>>
>>> I currently have hacked FieldValueCountCollapseCollectorFactory to
>>> return:
>>> <str name="fieldValue">ccc#003</str>
>>> but this respose looks dirty...
>>>
>>> As I said maybe I am missunderstanding something and this can be knwon in
>>> someway. In that case can someone tell me how?
>>> Thanks in advance
>>>
>>>
>>>
>>>
>>>
>>>
>>> JIRA jira@apache.org wrote:
>>>>
>>>>
>>>>     [
>>>> https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783484#action_12783484
>>>> ]
>>>>
>>>> Martijn van Groningen edited comment on SOLR-236 at 11/29/09 9:56 PM:
>>>> ----------------------------------------------------------------------
>>>>
>>>> I have attached a new patch that has the following changes:
>>>> # Added caching for the field collapse functionality. Check the [solr
>>>> wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to configure
>>>> field-collapsing with caching.
>>>> # Removed the collapse.max parameter (collapse.threshold must be used
>>>> instead). It was deprecated for a long time.
>>>>
>>>>       was (Author: martijn):
>>>>     I have attached a new patch that has the following changes:
>>>> # Added caching for the field collapse functionality. Check the [solr
>>>> wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to configure
>>>> the
>>>> field-collapsing with caching.
>>>> # Removed the collapse.max parameter (collapse.threshold must be used
>>>> instead). It was deprecated for a long time.
>>>>
>>>>> Field collapsing
>>>>> ----------------
>>>>>
>>>>>                 Key: SOLR-236
>>>>>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>>>>>             Project: Solr
>>>>>          Issue Type: New Feature
>>>>>          Components: search
>>>>>    Affects Versions: 1.3
>>>>>            Reporter: Emmanuel Keller
>>>>>             Fix For: 1.5
>>>>>
>>>>>         Attachments: collapsing-patch-to-1.3.0-dieter.patch,
>>>>> collapsing-patch-to-1.3.0-ivan.patch,
>>>>> collapsing-patch-to-1.3.0-ivan_2.patch,
>>>>> collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch,
>>>>> field-collapse-4-with-solrj.patch, field-collapse-5.patch,
>>>>> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
>>>>> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
>>>>> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
>>>>> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
>>>>> field-collapse-solr-236-2.patch, field-collapse-solr-236.patch,
>>>>> field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch,
>>>>> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff,
>>>>> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff,
>>>>> quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch,
>>>>> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch,
>>>>> solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>>>>>
>>>>>
>>>>> This patch include a new feature called "Field collapsing".
>>>>> "Used in order to collapse a group of results with similar value for
a
>>>>> given field to a single entry in the result set. Site collapsing is a
>>>>> special case of this, where all results for a given web site is
>>>>> collapsed
>>>>> into one or two entries in the result set, typically with an associated
>>>>> "more documents from this site" link. See also Duplicate detection."
>>>>> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
>>>>> The implementation add 3 new query parameters (SolrParams):
>>>>> "collapse.field" to choose the field used to group results
>>>>> "collapse.type" normal (default value) or adjacent
>>>>> "collapse.max" to select how many continuous results are allowed before
>>>>> collapsing
>>>>> TODO (in progress):
>>>>> - More documentation (on source code)
>>>>> - Test cases
>>>>> Two patches:
>>>>> - "field_collapsing.patch" for current development version
>>>>> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
>>>>> P.S.: Feedback and misspelling correction are welcome ;-)
>>>>
>>>> --
>>>> This message is automatically generated by JIRA.
>>>> -
>>>> You can reply to this email to add a comment to the issue online.
>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/-jira--Created%3A-%28SOLR-236%29-Field-collapsing-tp10440315p26674651.html
>>> Sent from the Solr - Dev mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> Met vriendelijke groet,
>>
>> Martijn van Groningen
>>
>>
>
> --
> View this message in context: http://old.nabble.com/-jira--Created%3A-%28SOLR-236%29-Field-collapsing-tp10440315p26678606.html
> Sent from the Solr - Dev mailing list archive at Nabble.com.
>
>



-- 
Met vriendelijke groet,

Martijn van Groningen

Mime
View raw message