lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marc Sturlese <marc.sturl...@gmail.com>
Subject Re: [jira] Issue Comment Edited: (SOLR-236) Field collapsing
Date Mon, 07 Dec 2009 15:39:53 GMT

The request I am sending is:
http://localhost:8983/solr/select/?q=aaa&version=2.2&start=0&rows=20&indent=on&collapse.field=col&collapse.includeCollapsedDocs.fl=*&collapse.type=adjacent&collapse.info.doc=true&collapse.info.count=true

I search for 'aaa' in the content field. All the documents in the result
contain that string in the field content

Martijn v Groningen wrote:
> 
> Yes it should look similar to that. What is the exact request you send to
> Solr?
> Also to check if the patch works correctly can you run: ant clean test
> There are a number of tests that test the Field collapse functionality.
> 
> Martijn
> 
> 
> 2009/12/7 Marc Sturlese <marc.sturlese@gmail.com>:
>>
>>><lst name="collapse_counts">
>>>   <str name="field">cat</str>
>>>    <lst name="results">
>>>        <lst name="009">
>>>            <str name="fieldValue">hard</str>
>>>           <int name="collapseCount">1</int>
>>>            <result name="collapsedDocs" numFound="1" start="0">
>>>                 <doc>
>>>                    <long name="id">008</long>
>>>                    <str name="content">aaa aaa</str>
>>>                    <str name="col">ccc</str>
>>>                 </doc>
>>>            </result>
>>>        </lst>
>>>        ...
>>>    </lst>
>>></lst>
>> I see, looks like I am applying the patch wrongly somehow.
>> This the complete collapse_counts response I am getting:
>> <lst name="collapse_counts">
>>  <str name="field">col</str>
>>  <lst name="results">
>>    <lst>
>>      <int name="collapseCount">1</int>
>>      <int name="collapseCount">1</int>
>>      <int name="collapseCount">1</int>
>>      <str name="fieldValue">bbb</str>
>>      <str name="fieldValue">ccc</str>
>>      <str name="fieldValue">xxx</str>
>>      <result name="collapsedDocs" numFound="1" start="0">
>>        <doc>
>>          <long name="id">2</long>
>>          <str name="content">aaa aaa</str>
>>          <str name="col">bbb</str>
>>        </doc>
>>      </result>
>>      <result name="collapsedDocs" numFound="1" start="0">
>>        <doc>
>>          <long name="id">8</long>
>>          <str name="content">aaa aaa aaa sd</str>
>>          <str name="col">ccc</str>
>>       </doc>
>>      </result>
>>      <result name="collapsedDocs" numFound="4" start="0">
>>        <doc>
>>          <long name="id">12</long>
>>          <str name="content">aaa aaa aaa v</str>
>>          <str name="col">xxx</str>
>>        </doc>
>>      </result>
>>    </lst>
>>  </lst>
>> </lst>
>>
>> As you can see I am getting a <lst> tag with no name. As I understood
>> what
>> you told me. I should be getting as many lst tags as collapsed groups and
>> the name attribute of the lst should be the unique field value. So, if
>> the
>> patch was applyed correcly teh response should look like:
>>
>> <lst name="collapse_counts">
>>  <str name="field">col</str>
>>  <lst name="results">
>>    <lst name="354> (the head value of the collapsed group)
>>      <int name="collapseCount">1</int>
>>      <str name="fieldValue">bbb</str>
>>      <result name="collapsedDocs" numFound="1" start="0">
>>        <doc>
>>          <long name="id">2</long>
>>          <str name="content">aaa aaa</str>
>>          <str name="col">bbb</str>
>>        </doc>
>>      </result>
>>    </lst>
>>    <lst name="654">
>>      <int name="collapseCount">1</int>
>>      <str name="fieldValue">ccc</str>
>>      <result name="collapsedDocs" numFound="1" start="0">
>>        <doc>
>>          <long name="id">8</long>
>>          <str name="content">aaa aaa aaa sd</str>
>>          <str name="col">ccc</str>
>>       </doc>
>>      </result>
>>    </lst>
>>    <lst name="654">
>>      <int name="collapseCount">1</int>
>>      <str name="fieldValue">xxx</str>
>>      <result name="collapsedDocs" numFound="4" start="0">
>>        <doc>
>>          <long name="id">12</long>
>>          <str name="content">aaa aaa aaa v</str>
>>          <str name="col">xxx</str>
>>        </doc>
>>      </result>
>>    </lst>
>>  </lst>
>> </lst>
>>
>> Is this the way the response looks like when you use teh patch?
>> Thanks in advance
>>
>>
>> Martijn v Groningen wrote:
>>>
>>> Hi Marc,
>>>
>>> I'm not sure if I follow you completely, but the example you gave is
>>> not complete. I'm missing a few tags in your example. Lets assume the
>>> following response that the latest patches produce.
>>>
>>> <lst name="collapse_counts">
>>>     <str name="field">cat</str>
>>>     <lst name="results">
>>>         <lst name="009">
>>>             <str name="fieldValue">hard</str>
>>>             <int name="collapseCount">1</int>
>>>             <result name="collapsedDocs" numFound="1" start="0">
>>>                  <doc>
>>>                     <long name="id">008</long>
>>>                     <str name="content">aaa aaa</str>
>>>                     <str name="col">ccc</str>
>>>                  </doc>
>>>             </result>
>>>         </lst>
>>>         ...
>>>     </lst>
>>> </lst>
>>>
>>> The result list contains collapse groups. The name of the child
>>> elements are the collapse head ids. Everything that falls under the
>>> collapse head belongs to that collapse group and thus adding document
>>> head id to the field value is unnecessary.  In the above example
>>> document with id 009 is the document head of document with id 008.
>>> Document with id 009 should be displayed in the search result.
>>>
>>> From what you have said, it seems that you properly configured the
>>> patch.
>>>
>>> Martijn
>>>
>>> 2009/12/7 Marc Sturlese <marc.sturlese@gmail.com>:
>>>>
>>>> Hey there, I have beeb testing the last patch and I think or I am
>>>> missing
>>>> something or the way to show the collapsed documents when adjacent
>>>> collapse
>>>> can be sometimes confusing:
>>>> I am using the patch replacing queryComponent for collapseComponent
>>>> (not
>>>> using both at same time):
>>>>  <searchComponent name="query"
>>>> class="org.apache.solr.handler.component.CollapseComponent">
>>>> What I have noticed is, imagin you get these results in the search:
>>>> doc1:
>>>>   id:001
>>>>   collapseField:ccc
>>>> doc2:
>>>>   id:002
>>>>   collapseField:aaa
>>>> doc3:
>>>>   id:003
>>>>   collapseField:ccc
>>>> doc4:
>>>>   id:004
>>>>   collapseField:bbb
>>>>
>>>> And in the collapse_counts you get:
>>>> <int name="collapseCount">1</int>
>>>> <str name="fieldValue">ccc</str>
>>>> <result name="collapsedDocs" numFound="1" start="0">
>>>> <doc>
>>>> <long name="id">008</long>
>>>> <str name="content">aaa aaa</str>
>>>> <str name="col">ccc</str>
>>>> </doc>
>>>> </result>
>>>>
>>>> Now, how can I know the head document of doc 008? Both 001 and 003
>>>> could
>>>> be... wouldn't make sense to connect in someway  the uniqueField with
>>>> the
>>>> collapsed documents?
>>>>
>>>> Adding something to collapse_counts like:
>>>> <int name="collapseCount">1</int>
>>>> <str name="fieldValue">ccc</str>
>>>> <str name="uniqueFieldId">003</str>
>>>>
>>>> I currently have hacked FieldValueCountCollapseCollectorFactory to
>>>> return:
>>>> <str name="fieldValue">ccc#003</str>
>>>> but this respose looks dirty...
>>>>
>>>> As I said maybe I am missunderstanding something and this can be knwon
>>>> in
>>>> someway. In that case can someone tell me how?
>>>> Thanks in advance
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> JIRA jira@apache.org wrote:
>>>>>
>>>>>
>>>>>     [
>>>>> https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783484#action_12783484
>>>>> ]
>>>>>
>>>>> Martijn van Groningen edited comment on SOLR-236 at 11/29/09 9:56 PM:
>>>>> ----------------------------------------------------------------------
>>>>>
>>>>> I have attached a new patch that has the following changes:
>>>>> # Added caching for the field collapse functionality. Check the [solr
>>>>> wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to configure
>>>>> field-collapsing with caching.
>>>>> # Removed the collapse.max parameter (collapse.threshold must be used
>>>>> instead). It was deprecated for a long time.
>>>>>
>>>>>       was (Author: martijn):
>>>>>     I have attached a new patch that has the following changes:
>>>>> # Added caching for the field collapse functionality. Check the [solr
>>>>> wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to configure
>>>>> the
>>>>> field-collapsing with caching.
>>>>> # Removed the collapse.max parameter (collapse.threshold must be used
>>>>> instead). It was deprecated for a long time.
>>>>>
>>>>>> Field collapsing
>>>>>> ----------------
>>>>>>
>>>>>>                 Key: SOLR-236
>>>>>>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>>>>>>             Project: Solr
>>>>>>          Issue Type: New Feature
>>>>>>          Components: search
>>>>>>    Affects Versions: 1.3
>>>>>>            Reporter: Emmanuel Keller
>>>>>>             Fix For: 1.5
>>>>>>
>>>>>>         Attachments: collapsing-patch-to-1.3.0-dieter.patch,
>>>>>> collapsing-patch-to-1.3.0-ivan.patch,
>>>>>> collapsing-patch-to-1.3.0-ivan_2.patch,
>>>>>> collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch,
>>>>>> field-collapse-4-with-solrj.patch, field-collapse-5.patch,
>>>>>> field-collapse-5.patch, field-collapse-5.patch,
>>>>>> field-collapse-5.patch,
>>>>>> field-collapse-5.patch, field-collapse-5.patch,
>>>>>> field-collapse-5.patch,
>>>>>> field-collapse-5.patch, field-collapse-5.patch,
>>>>>> field-collapse-5.patch,
>>>>>> field-collapse-5.patch, field-collapse-5.patch,
>>>>>> field-collapse-5.patch,
>>>>>> field-collapse-solr-236-2.patch, field-collapse-solr-236.patch,
>>>>>> field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch,
>>>>>> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff,
>>>>>> field_collapsing_dsteigerwald.diff,
>>>>>> field_collapsing_dsteigerwald.diff,
>>>>>> quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch,
>>>>>> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch,
>>>>>> solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>>>>>>
>>>>>>
>>>>>> This patch include a new feature called "Field collapsing".
>>>>>> "Used in order to collapse a group of results with similar value
for
>>>>>> a
>>>>>> given field to a single entry in the result set. Site collapsing
is a
>>>>>> special case of this, where all results for a given web site is
>>>>>> collapsed
>>>>>> into one or two entries in the result set, typically with an
>>>>>> associated
>>>>>> "more documents from this site" link. See also Duplicate detection."
>>>>>> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
>>>>>> The implementation add 3 new query parameters (SolrParams):
>>>>>> "collapse.field" to choose the field used to group results
>>>>>> "collapse.type" normal (default value) or adjacent
>>>>>> "collapse.max" to select how many continuous results are allowed
>>>>>> before
>>>>>> collapsing
>>>>>> TODO (in progress):
>>>>>> - More documentation (on source code)
>>>>>> - Test cases
>>>>>> Two patches:
>>>>>> - "field_collapsing.patch" for current development version
>>>>>> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
>>>>>> P.S.: Feedback and misspelling correction are welcome ;-)
>>>>>
>>>>> --
>>>>> This message is automatically generated by JIRA.
>>>>> -
>>>>> You can reply to this email to add a comment to the issue online.
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://old.nabble.com/-jira--Created%3A-%28SOLR-236%29-Field-collapsing-tp10440315p26674651.html
>>>> Sent from the Solr - Dev mailing list archive at Nabble.com.
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Met vriendelijke groet,
>>>
>>> Martijn van Groningen
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/-jira--Created%3A-%28SOLR-236%29-Field-collapsing-tp10440315p26678606.html
>> Sent from the Solr - Dev mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> Met vriendelijke groet,
> 
> Martijn van Groningen
> 
> 

-- 
View this message in context: http://old.nabble.com/-jira--Created%3A-%28SOLR-236%29-Field-collapsing-tp10440315p26679037.html
Sent from the Solr - Dev mailing list archive at Nabble.com.


Mime
View raw message