lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martijn v Groningen <martijn.is.h...@gmail.com>
Subject Re: [jira] Issue Comment Edited: (SOLR-236) Field collapsing
Date Mon, 07 Dec 2009 16:00:52 GMT
The last two parameters are not necessary, since they default both to
true. Could you run the field collapse tests tests successful?

2009/12/7 Marc Sturlese <marc.sturlese@gmail.com>:
>
> The request I am sending is:
> http://localhost:8983/solr/select/?q=aaa&version=2.2&start=0&rows=20&indent=on&collapse.field=col&collapse.includeCollapsedDocs.fl=*&collapse.type=adjacent&collapse.info.doc=true&collapse.info.count=true
>
> I search for 'aaa' in the content field. All the documents in the result
> contain that string in the field content
>
> Martijn v Groningen wrote:
>>
>> Yes it should look similar to that. What is the exact request you send to
>> Solr?
>> Also to check if the patch works correctly can you run: ant clean test
>> There are a number of tests that test the Field collapse functionality.
>>
>> Martijn
>>
>>
>> 2009/12/7 Marc Sturlese <marc.sturlese@gmail.com>:
>>>
>>>><lst name="collapse_counts">
>>>>   <str name="field">cat</str>
>>>>    <lst name="results">
>>>>        <lst name="009">
>>>>            <str name="fieldValue">hard</str>
>>>>           <int name="collapseCount">1</int>
>>>>            <result name="collapsedDocs" numFound="1" start="0">
>>>>                 <doc>
>>>>                    <long name="id">008</long>
>>>>                    <str name="content">aaa aaa</str>
>>>>                    <str name="col">ccc</str>
>>>>                 </doc>
>>>>            </result>
>>>>        </lst>
>>>>        ...
>>>>    </lst>
>>>></lst>
>>> I see, looks like I am applying the patch wrongly somehow.
>>> This the complete collapse_counts response I am getting:
>>> <lst name="collapse_counts">
>>>  <str name="field">col</str>
>>>  <lst name="results">
>>>    <lst>
>>>      <int name="collapseCount">1</int>
>>>      <int name="collapseCount">1</int>
>>>      <int name="collapseCount">1</int>
>>>      <str name="fieldValue">bbb</str>
>>>      <str name="fieldValue">ccc</str>
>>>      <str name="fieldValue">xxx</str>
>>>      <result name="collapsedDocs" numFound="1" start="0">
>>>        <doc>
>>>          <long name="id">2</long>
>>>          <str name="content">aaa aaa</str>
>>>          <str name="col">bbb</str>
>>>        </doc>
>>>      </result>
>>>      <result name="collapsedDocs" numFound="1" start="0">
>>>        <doc>
>>>          <long name="id">8</long>
>>>          <str name="content">aaa aaa aaa sd</str>
>>>          <str name="col">ccc</str>
>>>       </doc>
>>>      </result>
>>>      <result name="collapsedDocs" numFound="4" start="0">
>>>        <doc>
>>>          <long name="id">12</long>
>>>          <str name="content">aaa aaa aaa v</str>
>>>          <str name="col">xxx</str>
>>>        </doc>
>>>      </result>
>>>    </lst>
>>>  </lst>
>>> </lst>
>>>
>>> As you can see I am getting a <lst> tag with no name. As I understood
>>> what
>>> you told me. I should be getting as many lst tags as collapsed groups and
>>> the name attribute of the lst should be the unique field value. So, if
>>> the
>>> patch was applyed correcly teh response should look like:
>>>
>>> <lst name="collapse_counts">
>>>  <str name="field">col</str>
>>>  <lst name="results">
>>>    <lst name="354> (the head value of the collapsed group)
>>>      <int name="collapseCount">1</int>
>>>      <str name="fieldValue">bbb</str>
>>>      <result name="collapsedDocs" numFound="1" start="0">
>>>        <doc>
>>>          <long name="id">2</long>
>>>          <str name="content">aaa aaa</str>
>>>          <str name="col">bbb</str>
>>>        </doc>
>>>      </result>
>>>    </lst>
>>>    <lst name="654">
>>>      <int name="collapseCount">1</int>
>>>      <str name="fieldValue">ccc</str>
>>>      <result name="collapsedDocs" numFound="1" start="0">
>>>        <doc>
>>>          <long name="id">8</long>
>>>          <str name="content">aaa aaa aaa sd</str>
>>>          <str name="col">ccc</str>
>>>       </doc>
>>>      </result>
>>>    </lst>
>>>    <lst name="654">
>>>      <int name="collapseCount">1</int>
>>>      <str name="fieldValue">xxx</str>
>>>      <result name="collapsedDocs" numFound="4" start="0">
>>>        <doc>
>>>          <long name="id">12</long>
>>>          <str name="content">aaa aaa aaa v</str>
>>>          <str name="col">xxx</str>
>>>        </doc>
>>>      </result>
>>>    </lst>
>>>  </lst>
>>> </lst>
>>>
>>> Is this the way the response looks like when you use teh patch?
>>> Thanks in advance
>>>
>>>
>>> Martijn v Groningen wrote:
>>>>
>>>> Hi Marc,
>>>>
>>>> I'm not sure if I follow you completely, but the example you gave is
>>>> not complete. I'm missing a few tags in your example. Lets assume the
>>>> following response that the latest patches produce.
>>>>
>>>> <lst name="collapse_counts">
>>>>     <str name="field">cat</str>
>>>>     <lst name="results">
>>>>         <lst name="009">
>>>>             <str name="fieldValue">hard</str>
>>>>             <int name="collapseCount">1</int>
>>>>             <result name="collapsedDocs" numFound="1" start="0">
>>>>                  <doc>
>>>>                     <long name="id">008</long>
>>>>                     <str name="content">aaa aaa</str>
>>>>                     <str name="col">ccc</str>
>>>>                  </doc>
>>>>             </result>
>>>>         </lst>
>>>>         ...
>>>>     </lst>
>>>> </lst>
>>>>
>>>> The result list contains collapse groups. The name of the child
>>>> elements are the collapse head ids. Everything that falls under the
>>>> collapse head belongs to that collapse group and thus adding document
>>>> head id to the field value is unnecessary.  In the above example
>>>> document with id 009 is the document head of document with id 008.
>>>> Document with id 009 should be displayed in the search result.
>>>>
>>>> From what you have said, it seems that you properly configured the
>>>> patch.
>>>>
>>>> Martijn
>>>>
>>>> 2009/12/7 Marc Sturlese <marc.sturlese@gmail.com>:
>>>>>
>>>>> Hey there, I have beeb testing the last patch and I think or I am
>>>>> missing
>>>>> something or the way to show the collapsed documents when adjacent
>>>>> collapse
>>>>> can be sometimes confusing:
>>>>> I am using the patch replacing queryComponent for collapseComponent
>>>>> (not
>>>>> using both at same time):
>>>>>  <searchComponent name="query"
>>>>> class="org.apache.solr.handler.component.CollapseComponent">
>>>>> What I have noticed is, imagin you get these results in the search:
>>>>> doc1:
>>>>>   id:001
>>>>>   collapseField:ccc
>>>>> doc2:
>>>>>   id:002
>>>>>   collapseField:aaa
>>>>> doc3:
>>>>>   id:003
>>>>>   collapseField:ccc
>>>>> doc4:
>>>>>   id:004
>>>>>   collapseField:bbb
>>>>>
>>>>> And in the collapse_counts you get:
>>>>> <int name="collapseCount">1</int>
>>>>> <str name="fieldValue">ccc</str>
>>>>> <result name="collapsedDocs" numFound="1" start="0">
>>>>> <doc>
>>>>> <long name="id">008</long>
>>>>> <str name="content">aaa aaa</str>
>>>>> <str name="col">ccc</str>
>>>>> </doc>
>>>>> </result>
>>>>>
>>>>> Now, how can I know the head document of doc 008? Both 001 and 003
>>>>> could
>>>>> be... wouldn't make sense to connect in someway  the uniqueField with
>>>>> the
>>>>> collapsed documents?
>>>>>
>>>>> Adding something to collapse_counts like:
>>>>> <int name="collapseCount">1</int>
>>>>> <str name="fieldValue">ccc</str>
>>>>> <str name="uniqueFieldId">003</str>
>>>>>
>>>>> I currently have hacked FieldValueCountCollapseCollectorFactory to
>>>>> return:
>>>>> <str name="fieldValue">ccc#003</str>
>>>>> but this respose looks dirty...
>>>>>
>>>>> As I said maybe I am missunderstanding something and this can be knwon
>>>>> in
>>>>> someway. In that case can someone tell me how?
>>>>> Thanks in advance
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> JIRA jira@apache.org wrote:
>>>>>>
>>>>>>
>>>>>>     [
>>>>>> https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783484#action_12783484
>>>>>> ]
>>>>>>
>>>>>> Martijn van Groningen edited comment on SOLR-236 at 11/29/09 9:56
PM:
>>>>>> ----------------------------------------------------------------------
>>>>>>
>>>>>> I have attached a new patch that has the following changes:
>>>>>> # Added caching for the field collapse functionality. Check the [solr
>>>>>> wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to configure
>>>>>> field-collapsing with caching.
>>>>>> # Removed the collapse.max parameter (collapse.threshold must be
used
>>>>>> instead). It was deprecated for a long time.
>>>>>>
>>>>>>       was (Author: martijn):
>>>>>>     I have attached a new patch that has the following changes:
>>>>>> # Added caching for the field collapse functionality. Check the [solr
>>>>>> wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to configure
>>>>>> the
>>>>>> field-collapsing with caching.
>>>>>> # Removed the collapse.max parameter (collapse.threshold must be
used
>>>>>> instead). It was deprecated for a long time.
>>>>>>
>>>>>>> Field collapsing
>>>>>>> ----------------
>>>>>>>
>>>>>>>                 Key: SOLR-236
>>>>>>>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>>>>>>>             Project: Solr
>>>>>>>          Issue Type: New Feature
>>>>>>>          Components: search
>>>>>>>    Affects Versions: 1.3
>>>>>>>            Reporter: Emmanuel Keller
>>>>>>>             Fix For: 1.5
>>>>>>>
>>>>>>>         Attachments: collapsing-patch-to-1.3.0-dieter.patch,
>>>>>>> collapsing-patch-to-1.3.0-ivan.patch,
>>>>>>> collapsing-patch-to-1.3.0-ivan_2.patch,
>>>>>>> collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch,
>>>>>>> field-collapse-4-with-solrj.patch, field-collapse-5.patch,
>>>>>>> field-collapse-5.patch, field-collapse-5.patch,
>>>>>>> field-collapse-5.patch,
>>>>>>> field-collapse-5.patch, field-collapse-5.patch,
>>>>>>> field-collapse-5.patch,
>>>>>>> field-collapse-5.patch, field-collapse-5.patch,
>>>>>>> field-collapse-5.patch,
>>>>>>> field-collapse-5.patch, field-collapse-5.patch,
>>>>>>> field-collapse-5.patch,
>>>>>>> field-collapse-solr-236-2.patch, field-collapse-solr-236.patch,
>>>>>>> field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch,
>>>>>>> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff,
>>>>>>> field_collapsing_dsteigerwald.diff,
>>>>>>> field_collapsing_dsteigerwald.diff,
>>>>>>> quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch,
>>>>>>> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch,
>>>>>>> solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>>>>>>>
>>>>>>>
>>>>>>> This patch include a new feature called "Field collapsing".
>>>>>>> "Used in order to collapse a group of results with similar value
for
>>>>>>> a
>>>>>>> given field to a single entry in the result set. Site collapsing
is a
>>>>>>> special case of this, where all results for a given web site
is
>>>>>>> collapsed
>>>>>>> into one or two entries in the result set, typically with an
>>>>>>> associated
>>>>>>> "more documents from this site" link. See also Duplicate detection."
>>>>>>> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
>>>>>>> The implementation add 3 new query parameters (SolrParams):
>>>>>>> "collapse.field" to choose the field used to group results
>>>>>>> "collapse.type" normal (default value) or adjacent
>>>>>>> "collapse.max" to select how many continuous results are allowed
>>>>>>> before
>>>>>>> collapsing
>>>>>>> TODO (in progress):
>>>>>>> - More documentation (on source code)
>>>>>>> - Test cases
>>>>>>> Two patches:
>>>>>>> - "field_collapsing.patch" for current development version
>>>>>>> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
>>>>>>> P.S.: Feedback and misspelling correction are welcome ;-)
>>>>>>
>>>>>> --
>>>>>> This message is automatically generated by JIRA.
>>>>>> -
>>>>>> You can reply to this email to add a comment to the issue online.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://old.nabble.com/-jira--Created%3A-%28SOLR-236%29-Field-collapsing-tp10440315p26674651.html
>>>>> Sent from the Solr - Dev mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Met vriendelijke groet,
>>>>
>>>> Martijn van Groningen
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/-jira--Created%3A-%28SOLR-236%29-Field-collapsing-tp10440315p26678606.html
>>> Sent from the Solr - Dev mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> Met vriendelijke groet,
>>
>> Martijn van Groningen
>>
>>
>
> --
> View this message in context: http://old.nabble.com/-jira--Created%3A-%28SOLR-236%29-Field-collapsing-tp10440315p26679037.html
> Sent from the Solr - Dev mailing list archive at Nabble.com.
>
>



-- 
Met vriendelijke groet,

Martijn van Groningen

Mime
View raw message