lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martijn v Groningen <martijn.is.h...@gmail.com>
Subject Re: [jira] Issue Comment Edited: (SOLR-236) Field collapsing
Date Mon, 07 Dec 2009 18:22:06 GMT
Yes, I can reproduce the same situation here. I will update the patch
asap and add it to Jira.

Martijn

2009/12/7 Marc Sturlese <marc.sturlese@gmail.com>:
>
> Hey! Got it working!
> The problem was that my uniqueField is indexed as long and it's not suported
> by the patch.
> The value is obtained in getCollapseGroupResult function in
> AbstarctCollapseCollector.java as:
>
> String schemaId = searcher.doc(docId).get(uniqueIdFieldname);
>
> To suport long,int,slong,sint,float,sfloat...
> It should be obtaining doing somenthing like:
>
> FieldType idFieldType =
> searcher.getSchema().getFieldType(uniqueIdFieldname);
> String schemaId = "";
> Fieldable name_field = null;
> try {
>      name_field = searcher.doc(id).getFieldable(uniqueIdFieldname);
> } catch (IOException ex) {
>      //deal with exception
> }
> if (name_field != null) {
>   schemaId = idFieldType.storedToReadable(name_field);
> }
>
>
> Martijn v Groningen wrote:
>>
>> The last two parameters are not necessary, since they default both to
>> true. Could you run the field collapse tests tests successful?
>>
>> 2009/12/7 Marc Sturlese <marc.sturlese@gmail.com>:
>>>
>>> The request I am sending is:
>>> http://localhost:8983/solr/select/?q=aaa&version=2.2&start=0&rows=20&indent=on&collapse.field=col&collapse.includeCollapsedDocs.fl=*&collapse.type=adjacent&collapse.info.doc=true&collapse.info.count=true
>>>
>>> I search for 'aaa' in the content field. All the documents in the result
>>> contain that string in the field content
>>>
>>> Martijn v Groningen wrote:
>>>>
>>>> Yes it should look similar to that. What is the exact request you send
>>>> to
>>>> Solr?
>>>> Also to check if the patch works correctly can you run: ant clean test
>>>> There are a number of tests that test the Field collapse functionality.
>>>>
>>>> Martijn
>>>>
>>>>
>>>> 2009/12/7 Marc Sturlese <marc.sturlese@gmail.com>:
>>>>>
>>>>>><lst name="collapse_counts">
>>>>>>   <str name="field">cat</str>
>>>>>>    <lst name="results">
>>>>>>        <lst name="009">
>>>>>>            <str name="fieldValue">hard</str>
>>>>>>           <int name="collapseCount">1</int>
>>>>>>            <result name="collapsedDocs" numFound="1" start="0">
>>>>>>                 <doc>
>>>>>>                    <long name="id">008</long>
>>>>>>                    <str name="content">aaa aaa</str>
>>>>>>                    <str name="col">ccc</str>
>>>>>>                 </doc>
>>>>>>            </result>
>>>>>>        </lst>
>>>>>>        ...
>>>>>>    </lst>
>>>>>></lst>
>>>>> I see, looks like I am applying the patch wrongly somehow.
>>>>> This the complete collapse_counts response I am getting:
>>>>> <lst name="collapse_counts">
>>>>>  <str name="field">col</str>
>>>>>  <lst name="results">
>>>>>    <lst>
>>>>>      <int name="collapseCount">1</int>
>>>>>      <int name="collapseCount">1</int>
>>>>>      <int name="collapseCount">1</int>
>>>>>      <str name="fieldValue">bbb</str>
>>>>>      <str name="fieldValue">ccc</str>
>>>>>      <str name="fieldValue">xxx</str>
>>>>>      <result name="collapsedDocs" numFound="1" start="0">
>>>>>        <doc>
>>>>>          <long name="id">2</long>
>>>>>          <str name="content">aaa aaa</str>
>>>>>          <str name="col">bbb</str>
>>>>>        </doc>
>>>>>      </result>
>>>>>      <result name="collapsedDocs" numFound="1" start="0">
>>>>>        <doc>
>>>>>          <long name="id">8</long>
>>>>>          <str name="content">aaa aaa aaa sd</str>
>>>>>          <str name="col">ccc</str>
>>>>>       </doc>
>>>>>      </result>
>>>>>      <result name="collapsedDocs" numFound="4" start="0">
>>>>>        <doc>
>>>>>          <long name="id">12</long>
>>>>>          <str name="content">aaa aaa aaa v</str>
>>>>>          <str name="col">xxx</str>
>>>>>        </doc>
>>>>>      </result>
>>>>>    </lst>
>>>>>  </lst>
>>>>> </lst>
>>>>>
>>>>> As you can see I am getting a <lst> tag with no name. As I understood
>>>>> what
>>>>> you told me. I should be getting as many lst tags as collapsed groups
>>>>> and
>>>>> the name attribute of the lst should be the unique field value. So, if
>>>>> the
>>>>> patch was applyed correcly teh response should look like:
>>>>>
>>>>> <lst name="collapse_counts">
>>>>>  <str name="field">col</str>
>>>>>  <lst name="results">
>>>>>    <lst name="354> (the head value of the collapsed group)
>>>>>      <int name="collapseCount">1</int>
>>>>>      <str name="fieldValue">bbb</str>
>>>>>      <result name="collapsedDocs" numFound="1" start="0">
>>>>>        <doc>
>>>>>          <long name="id">2</long>
>>>>>          <str name="content">aaa aaa</str>
>>>>>          <str name="col">bbb</str>
>>>>>        </doc>
>>>>>      </result>
>>>>>    </lst>
>>>>>    <lst name="654">
>>>>>      <int name="collapseCount">1</int>
>>>>>      <str name="fieldValue">ccc</str>
>>>>>      <result name="collapsedDocs" numFound="1" start="0">
>>>>>        <doc>
>>>>>          <long name="id">8</long>
>>>>>          <str name="content">aaa aaa aaa sd</str>
>>>>>          <str name="col">ccc</str>
>>>>>       </doc>
>>>>>      </result>
>>>>>    </lst>
>>>>>    <lst name="654">
>>>>>      <int name="collapseCount">1</int>
>>>>>      <str name="fieldValue">xxx</str>
>>>>>      <result name="collapsedDocs" numFound="4" start="0">
>>>>>        <doc>
>>>>>          <long name="id">12</long>
>>>>>          <str name="content">aaa aaa aaa v</str>
>>>>>          <str name="col">xxx</str>
>>>>>        </doc>
>>>>>      </result>
>>>>>    </lst>
>>>>>  </lst>
>>>>> </lst>
>>>>>
>>>>> Is this the way the response looks like when you use teh patch?
>>>>> Thanks in advance
>>>>>
>>>>>
>>>>> Martijn v Groningen wrote:
>>>>>>
>>>>>> Hi Marc,
>>>>>>
>>>>>> I'm not sure if I follow you completely, but the example you gave
is
>>>>>> not complete. I'm missing a few tags in your example. Lets assume
the
>>>>>> following response that the latest patches produce.
>>>>>>
>>>>>> <lst name="collapse_counts">
>>>>>>     <str name="field">cat</str>
>>>>>>     <lst name="results">
>>>>>>         <lst name="009">
>>>>>>             <str name="fieldValue">hard</str>
>>>>>>             <int name="collapseCount">1</int>
>>>>>>             <result name="collapsedDocs" numFound="1" start="0">
>>>>>>                  <doc>
>>>>>>                     <long name="id">008</long>
>>>>>>                     <str name="content">aaa aaa</str>
>>>>>>                     <str name="col">ccc</str>
>>>>>>                  </doc>
>>>>>>             </result>
>>>>>>         </lst>
>>>>>>         ...
>>>>>>     </lst>
>>>>>> </lst>
>>>>>>
>>>>>> The result list contains collapse groups. The name of the child
>>>>>> elements are the collapse head ids. Everything that falls under the
>>>>>> collapse head belongs to that collapse group and thus adding document
>>>>>> head id to the field value is unnecessary.  In the above example
>>>>>> document with id 009 is the document head of document with id 008.
>>>>>> Document with id 009 should be displayed in the search result.
>>>>>>
>>>>>> From what you have said, it seems that you properly configured the
>>>>>> patch.
>>>>>>
>>>>>> Martijn
>>>>>>
>>>>>> 2009/12/7 Marc Sturlese <marc.sturlese@gmail.com>:
>>>>>>>
>>>>>>> Hey there, I have beeb testing the last patch and I think or
I am
>>>>>>> missing
>>>>>>> something or the way to show the collapsed documents when adjacent
>>>>>>> collapse
>>>>>>> can be sometimes confusing:
>>>>>>> I am using the patch replacing queryComponent for collapseComponent
>>>>>>> (not
>>>>>>> using both at same time):
>>>>>>>  <searchComponent name="query"
>>>>>>> class="org.apache.solr.handler.component.CollapseComponent">
>>>>>>> What I have noticed is, imagin you get these results in the search:
>>>>>>> doc1:
>>>>>>>   id:001
>>>>>>>   collapseField:ccc
>>>>>>> doc2:
>>>>>>>   id:002
>>>>>>>   collapseField:aaa
>>>>>>> doc3:
>>>>>>>   id:003
>>>>>>>   collapseField:ccc
>>>>>>> doc4:
>>>>>>>   id:004
>>>>>>>   collapseField:bbb
>>>>>>>
>>>>>>> And in the collapse_counts you get:
>>>>>>> <int name="collapseCount">1</int>
>>>>>>> <str name="fieldValue">ccc</str>
>>>>>>> <result name="collapsedDocs" numFound="1" start="0">
>>>>>>> <doc>
>>>>>>> <long name="id">008</long>
>>>>>>> <str name="content">aaa aaa</str>
>>>>>>> <str name="col">ccc</str>
>>>>>>> </doc>
>>>>>>> </result>
>>>>>>>
>>>>>>> Now, how can I know the head document of doc 008? Both 001 and
003
>>>>>>> could
>>>>>>> be... wouldn't make sense to connect in someway  the uniqueField
with
>>>>>>> the
>>>>>>> collapsed documents?
>>>>>>>
>>>>>>> Adding something to collapse_counts like:
>>>>>>> <int name="collapseCount">1</int>
>>>>>>> <str name="fieldValue">ccc</str>
>>>>>>> <str name="uniqueFieldId">003</str>
>>>>>>>
>>>>>>> I currently have hacked FieldValueCountCollapseCollectorFactory
to
>>>>>>> return:
>>>>>>> <str name="fieldValue">ccc#003</str>
>>>>>>> but this respose looks dirty...
>>>>>>>
>>>>>>> As I said maybe I am missunderstanding something and this can
be
>>>>>>> knwon
>>>>>>> in
>>>>>>> someway. In that case can someone tell me how?
>>>>>>> Thanks in advance
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> JIRA jira@apache.org wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>     [
>>>>>>>> https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783484#action_12783484
>>>>>>>> ]
>>>>>>>>
>>>>>>>> Martijn van Groningen edited comment on SOLR-236 at 11/29/09
9:56
>>>>>>>> PM:
>>>>>>>> ----------------------------------------------------------------------
>>>>>>>>
>>>>>>>> I have attached a new patch that has the following changes:
>>>>>>>> # Added caching for the field collapse functionality. Check
the
>>>>>>>> [solr
>>>>>>>> wiki|http://wiki.apache.org/solr/FieldCollapsing] for how
to
>>>>>>>> configure
>>>>>>>> field-collapsing with caching.
>>>>>>>> # Removed the collapse.max parameter (collapse.threshold
must be
>>>>>>>> used
>>>>>>>> instead). It was deprecated for a long time.
>>>>>>>>
>>>>>>>>       was (Author: martijn):
>>>>>>>>     I have attached a new patch that has the following
changes:
>>>>>>>> # Added caching for the field collapse functionality. Check
the
>>>>>>>> [solr
>>>>>>>> wiki|http://wiki.apache.org/solr/FieldCollapsing] for how
to
>>>>>>>> configure
>>>>>>>> the
>>>>>>>> field-collapsing with caching.
>>>>>>>> # Removed the collapse.max parameter (collapse.threshold
must be
>>>>>>>> used
>>>>>>>> instead). It was deprecated for a long time.
>>>>>>>>
>>>>>>>>> Field collapsing
>>>>>>>>> ----------------
>>>>>>>>>
>>>>>>>>>                 Key: SOLR-236
>>>>>>>>>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>>>>>>>>>             Project: Solr
>>>>>>>>>          Issue Type: New Feature
>>>>>>>>>          Components: search
>>>>>>>>>    Affects Versions: 1.3
>>>>>>>>>            Reporter: Emmanuel Keller
>>>>>>>>>             Fix For: 1.5
>>>>>>>>>
>>>>>>>>>         Attachments: collapsing-patch-to-1.3.0-dieter.patch,
>>>>>>>>> collapsing-patch-to-1.3.0-ivan.patch,
>>>>>>>>> collapsing-patch-to-1.3.0-ivan_2.patch,
>>>>>>>>> collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch,
>>>>>>>>> field-collapse-4-with-solrj.patch, field-collapse-5.patch,
>>>>>>>>> field-collapse-5.patch, field-collapse-5.patch,
>>>>>>>>> field-collapse-5.patch,
>>>>>>>>> field-collapse-5.patch, field-collapse-5.patch,
>>>>>>>>> field-collapse-5.patch,
>>>>>>>>> field-collapse-5.patch, field-collapse-5.patch,
>>>>>>>>> field-collapse-5.patch,
>>>>>>>>> field-collapse-5.patch, field-collapse-5.patch,
>>>>>>>>> field-collapse-5.patch,
>>>>>>>>> field-collapse-solr-236-2.patch, field-collapse-solr-236.patch,
>>>>>>>>> field-collapsing-extended-592129.patch,
>>>>>>>>> field_collapsing_1.1.0.patch,
>>>>>>>>> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff,
>>>>>>>>> field_collapsing_dsteigerwald.diff,
>>>>>>>>> field_collapsing_dsteigerwald.diff,
>>>>>>>>> quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch,
>>>>>>>>> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch,
>>>>>>>>> solr-236.patch, SOLR-236_collapsing.patch,
>>>>>>>>> SOLR-236_collapsing.patch
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This patch include a new feature called "Field collapsing".
>>>>>>>>> "Used in order to collapse a group of results with similar
value
>>>>>>>>> for
>>>>>>>>> a
>>>>>>>>> given field to a single entry in the result set. Site
collapsing is
>>>>>>>>> a
>>>>>>>>> special case of this, where all results for a given web
site is
>>>>>>>>> collapsed
>>>>>>>>> into one or two entries in the result set, typically
with an
>>>>>>>>> associated
>>>>>>>>> "more documents from this site" link. See also Duplicate
>>>>>>>>> detection."
>>>>>>>>> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
>>>>>>>>> The implementation add 3 new query parameters (SolrParams):
>>>>>>>>> "collapse.field" to choose the field used to group results
>>>>>>>>> "collapse.type" normal (default value) or adjacent
>>>>>>>>> "collapse.max" to select how many continuous results
are allowed
>>>>>>>>> before
>>>>>>>>> collapsing
>>>>>>>>> TODO (in progress):
>>>>>>>>> - More documentation (on source code)
>>>>>>>>> - Test cases
>>>>>>>>> Two patches:
>>>>>>>>> - "field_collapsing.patch" for current development version
>>>>>>>>> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
>>>>>>>>> P.S.: Feedback and misspelling correction are welcome
;-)
>>>>>>>>
>>>>>>>> --
>>>>>>>> This message is automatically generated by JIRA.
>>>>>>>> -
>>>>>>>> You can reply to this email to add a comment to the issue
online.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> View this message in context:
>>>>>>> http://old.nabble.com/-jira--Created%3A-%28SOLR-236%29-Field-collapsing-tp10440315p26674651.html
>>>>>>> Sent from the Solr - Dev mailing list archive at Nabble.com.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Met vriendelijke groet,
>>>>>>
>>>>>> Martijn van Groningen
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://old.nabble.com/-jira--Created%3A-%28SOLR-236%29-Field-collapsing-tp10440315p26678606.html
>>>>> Sent from the Solr - Dev mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Met vriendelijke groet,
>>>>
>>>> Martijn van Groningen
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/-jira--Created%3A-%28SOLR-236%29-Field-collapsing-tp10440315p26679037.html
>>> Sent from the Solr - Dev mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> Met vriendelijke groet,
>>
>> Martijn van Groningen
>>
>>
>
> --
> View this message in context: http://old.nabble.com/-jira--Created%3A-%28SOLR-236%29-Field-collapsing-tp10440315p26679520.html
> Sent from the Solr - Dev mailing list archive at Nabble.com.
>
>



-- 
Met vriendelijke groet,

Martijn van Groningen

Mime
View raw message