lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marc Sturlese <marc.sturl...@gmail.com>
Subject Re: [jira] Issue Comment Edited: (SOLR-236) Field collapsing
Date Mon, 07 Dec 2009 16:09:57 GMT

Hey! Got it working!
The problem was that my uniqueField is indexed as long and it's not suported
by the patch.
The value is obtained in getCollapseGroupResult function in
AbstarctCollapseCollector.java as:

String schemaId = searcher.doc(docId).get(uniqueIdFieldname);

To suport long,int,slong,sint,float,sfloat...
It should be obtaining doing somenthing like:

FieldType idFieldType =
searcher.getSchema().getFieldType(uniqueIdFieldname);
String schemaId = "";
Fieldable name_field = null;
try {
      name_field = searcher.doc(id).getFieldable(uniqueIdFieldname);
} catch (IOException ex) {
      //deal with exception                
}
if (name_field != null) {
   schemaId = idFieldType.storedToReadable(name_field);
}


Martijn v Groningen wrote:
> 
> The last two parameters are not necessary, since they default both to
> true. Could you run the field collapse tests tests successful?
> 
> 2009/12/7 Marc Sturlese <marc.sturlese@gmail.com>:
>>
>> The request I am sending is:
>> http://localhost:8983/solr/select/?q=aaa&version=2.2&start=0&rows=20&indent=on&collapse.field=col&collapse.includeCollapsedDocs.fl=*&collapse.type=adjacent&collapse.info.doc=true&collapse.info.count=true
>>
>> I search for 'aaa' in the content field. All the documents in the result
>> contain that string in the field content
>>
>> Martijn v Groningen wrote:
>>>
>>> Yes it should look similar to that. What is the exact request you send
>>> to
>>> Solr?
>>> Also to check if the patch works correctly can you run: ant clean test
>>> There are a number of tests that test the Field collapse functionality.
>>>
>>> Martijn
>>>
>>>
>>> 2009/12/7 Marc Sturlese <marc.sturlese@gmail.com>:
>>>>
>>>>><lst name="collapse_counts">
>>>>>   <str name="field">cat</str>
>>>>>    <lst name="results">
>>>>>        <lst name="009">
>>>>>            <str name="fieldValue">hard</str>
>>>>>           <int name="collapseCount">1</int>
>>>>>            <result name="collapsedDocs" numFound="1" start="0">
>>>>>                 <doc>
>>>>>                    <long name="id">008</long>
>>>>>                    <str name="content">aaa aaa</str>
>>>>>                    <str name="col">ccc</str>
>>>>>                 </doc>
>>>>>            </result>
>>>>>        </lst>
>>>>>        ...
>>>>>    </lst>
>>>>></lst>
>>>> I see, looks like I am applying the patch wrongly somehow.
>>>> This the complete collapse_counts response I am getting:
>>>> <lst name="collapse_counts">
>>>>  <str name="field">col</str>
>>>>  <lst name="results">
>>>>    <lst>
>>>>      <int name="collapseCount">1</int>
>>>>      <int name="collapseCount">1</int>
>>>>      <int name="collapseCount">1</int>
>>>>      <str name="fieldValue">bbb</str>
>>>>      <str name="fieldValue">ccc</str>
>>>>      <str name="fieldValue">xxx</str>
>>>>      <result name="collapsedDocs" numFound="1" start="0">
>>>>        <doc>
>>>>          <long name="id">2</long>
>>>>          <str name="content">aaa aaa</str>
>>>>          <str name="col">bbb</str>
>>>>        </doc>
>>>>      </result>
>>>>      <result name="collapsedDocs" numFound="1" start="0">
>>>>        <doc>
>>>>          <long name="id">8</long>
>>>>          <str name="content">aaa aaa aaa sd</str>
>>>>          <str name="col">ccc</str>
>>>>       </doc>
>>>>      </result>
>>>>      <result name="collapsedDocs" numFound="4" start="0">
>>>>        <doc>
>>>>          <long name="id">12</long>
>>>>          <str name="content">aaa aaa aaa v</str>
>>>>          <str name="col">xxx</str>
>>>>        </doc>
>>>>      </result>
>>>>    </lst>
>>>>  </lst>
>>>> </lst>
>>>>
>>>> As you can see I am getting a <lst> tag with no name. As I understood
>>>> what
>>>> you told me. I should be getting as many lst tags as collapsed groups
>>>> and
>>>> the name attribute of the lst should be the unique field value. So, if
>>>> the
>>>> patch was applyed correcly teh response should look like:
>>>>
>>>> <lst name="collapse_counts">
>>>>  <str name="field">col</str>
>>>>  <lst name="results">
>>>>    <lst name="354> (the head value of the collapsed group)
>>>>      <int name="collapseCount">1</int>
>>>>      <str name="fieldValue">bbb</str>
>>>>      <result name="collapsedDocs" numFound="1" start="0">
>>>>        <doc>
>>>>          <long name="id">2</long>
>>>>          <str name="content">aaa aaa</str>
>>>>          <str name="col">bbb</str>
>>>>        </doc>
>>>>      </result>
>>>>    </lst>
>>>>    <lst name="654">
>>>>      <int name="collapseCount">1</int>
>>>>      <str name="fieldValue">ccc</str>
>>>>      <result name="collapsedDocs" numFound="1" start="0">
>>>>        <doc>
>>>>          <long name="id">8</long>
>>>>          <str name="content">aaa aaa aaa sd</str>
>>>>          <str name="col">ccc</str>
>>>>       </doc>
>>>>      </result>
>>>>    </lst>
>>>>    <lst name="654">
>>>>      <int name="collapseCount">1</int>
>>>>      <str name="fieldValue">xxx</str>
>>>>      <result name="collapsedDocs" numFound="4" start="0">
>>>>        <doc>
>>>>          <long name="id">12</long>
>>>>          <str name="content">aaa aaa aaa v</str>
>>>>          <str name="col">xxx</str>
>>>>        </doc>
>>>>      </result>
>>>>    </lst>
>>>>  </lst>
>>>> </lst>
>>>>
>>>> Is this the way the response looks like when you use teh patch?
>>>> Thanks in advance
>>>>
>>>>
>>>> Martijn v Groningen wrote:
>>>>>
>>>>> Hi Marc,
>>>>>
>>>>> I'm not sure if I follow you completely, but the example you gave is
>>>>> not complete. I'm missing a few tags in your example. Lets assume the
>>>>> following response that the latest patches produce.
>>>>>
>>>>> <lst name="collapse_counts">
>>>>>     <str name="field">cat</str>
>>>>>     <lst name="results">
>>>>>         <lst name="009">
>>>>>             <str name="fieldValue">hard</str>
>>>>>             <int name="collapseCount">1</int>
>>>>>             <result name="collapsedDocs" numFound="1" start="0">
>>>>>                  <doc>
>>>>>                     <long name="id">008</long>
>>>>>                     <str name="content">aaa aaa</str>
>>>>>                     <str name="col">ccc</str>
>>>>>                  </doc>
>>>>>             </result>
>>>>>         </lst>
>>>>>         ...
>>>>>     </lst>
>>>>> </lst>
>>>>>
>>>>> The result list contains collapse groups. The name of the child
>>>>> elements are the collapse head ids. Everything that falls under the
>>>>> collapse head belongs to that collapse group and thus adding document
>>>>> head id to the field value is unnecessary.  In the above example
>>>>> document with id 009 is the document head of document with id 008.
>>>>> Document with id 009 should be displayed in the search result.
>>>>>
>>>>> From what you have said, it seems that you properly configured the
>>>>> patch.
>>>>>
>>>>> Martijn
>>>>>
>>>>> 2009/12/7 Marc Sturlese <marc.sturlese@gmail.com>:
>>>>>>
>>>>>> Hey there, I have beeb testing the last patch and I think or I am
>>>>>> missing
>>>>>> something or the way to show the collapsed documents when adjacent
>>>>>> collapse
>>>>>> can be sometimes confusing:
>>>>>> I am using the patch replacing queryComponent for collapseComponent
>>>>>> (not
>>>>>> using both at same time):
>>>>>>  <searchComponent name="query"
>>>>>> class="org.apache.solr.handler.component.CollapseComponent">
>>>>>> What I have noticed is, imagin you get these results in the search:
>>>>>> doc1:
>>>>>>   id:001
>>>>>>   collapseField:ccc
>>>>>> doc2:
>>>>>>   id:002
>>>>>>   collapseField:aaa
>>>>>> doc3:
>>>>>>   id:003
>>>>>>   collapseField:ccc
>>>>>> doc4:
>>>>>>   id:004
>>>>>>   collapseField:bbb
>>>>>>
>>>>>> And in the collapse_counts you get:
>>>>>> <int name="collapseCount">1</int>
>>>>>> <str name="fieldValue">ccc</str>
>>>>>> <result name="collapsedDocs" numFound="1" start="0">
>>>>>> <doc>
>>>>>> <long name="id">008</long>
>>>>>> <str name="content">aaa aaa</str>
>>>>>> <str name="col">ccc</str>
>>>>>> </doc>
>>>>>> </result>
>>>>>>
>>>>>> Now, how can I know the head document of doc 008? Both 001 and 003
>>>>>> could
>>>>>> be... wouldn't make sense to connect in someway  the uniqueField
with
>>>>>> the
>>>>>> collapsed documents?
>>>>>>
>>>>>> Adding something to collapse_counts like:
>>>>>> <int name="collapseCount">1</int>
>>>>>> <str name="fieldValue">ccc</str>
>>>>>> <str name="uniqueFieldId">003</str>
>>>>>>
>>>>>> I currently have hacked FieldValueCountCollapseCollectorFactory to
>>>>>> return:
>>>>>> <str name="fieldValue">ccc#003</str>
>>>>>> but this respose looks dirty...
>>>>>>
>>>>>> As I said maybe I am missunderstanding something and this can be
>>>>>> knwon
>>>>>> in
>>>>>> someway. In that case can someone tell me how?
>>>>>> Thanks in advance
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> JIRA jira@apache.org wrote:
>>>>>>>
>>>>>>>
>>>>>>>     [
>>>>>>> https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783484#action_12783484
>>>>>>> ]
>>>>>>>
>>>>>>> Martijn van Groningen edited comment on SOLR-236 at 11/29/09
9:56
>>>>>>> PM:
>>>>>>> ----------------------------------------------------------------------
>>>>>>>
>>>>>>> I have attached a new patch that has the following changes:
>>>>>>> # Added caching for the field collapse functionality. Check the
>>>>>>> [solr
>>>>>>> wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to
>>>>>>> configure
>>>>>>> field-collapsing with caching.
>>>>>>> # Removed the collapse.max parameter (collapse.threshold must
be
>>>>>>> used
>>>>>>> instead). It was deprecated for a long time.
>>>>>>>
>>>>>>>       was (Author: martijn):
>>>>>>>     I have attached a new patch that has the following changes:
>>>>>>> # Added caching for the field collapse functionality. Check the
>>>>>>> [solr
>>>>>>> wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to
>>>>>>> configure
>>>>>>> the
>>>>>>> field-collapsing with caching.
>>>>>>> # Removed the collapse.max parameter (collapse.threshold must
be
>>>>>>> used
>>>>>>> instead). It was deprecated for a long time.
>>>>>>>
>>>>>>>> Field collapsing
>>>>>>>> ----------------
>>>>>>>>
>>>>>>>>                 Key: SOLR-236
>>>>>>>>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>>>>>>>>             Project: Solr
>>>>>>>>          Issue Type: New Feature
>>>>>>>>          Components: search
>>>>>>>>    Affects Versions: 1.3
>>>>>>>>            Reporter: Emmanuel Keller
>>>>>>>>             Fix For: 1.5
>>>>>>>>
>>>>>>>>         Attachments: collapsing-patch-to-1.3.0-dieter.patch,
>>>>>>>> collapsing-patch-to-1.3.0-ivan.patch,
>>>>>>>> collapsing-patch-to-1.3.0-ivan_2.patch,
>>>>>>>> collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch,
>>>>>>>> field-collapse-4-with-solrj.patch, field-collapse-5.patch,
>>>>>>>> field-collapse-5.patch, field-collapse-5.patch,
>>>>>>>> field-collapse-5.patch,
>>>>>>>> field-collapse-5.patch, field-collapse-5.patch,
>>>>>>>> field-collapse-5.patch,
>>>>>>>> field-collapse-5.patch, field-collapse-5.patch,
>>>>>>>> field-collapse-5.patch,
>>>>>>>> field-collapse-5.patch, field-collapse-5.patch,
>>>>>>>> field-collapse-5.patch,
>>>>>>>> field-collapse-solr-236-2.patch, field-collapse-solr-236.patch,
>>>>>>>> field-collapsing-extended-592129.patch,
>>>>>>>> field_collapsing_1.1.0.patch,
>>>>>>>> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff,
>>>>>>>> field_collapsing_dsteigerwald.diff,
>>>>>>>> field_collapsing_dsteigerwald.diff,
>>>>>>>> quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch,
>>>>>>>> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch,
>>>>>>>> solr-236.patch, SOLR-236_collapsing.patch,
>>>>>>>> SOLR-236_collapsing.patch
>>>>>>>>
>>>>>>>>
>>>>>>>> This patch include a new feature called "Field collapsing".
>>>>>>>> "Used in order to collapse a group of results with similar
value
>>>>>>>> for
>>>>>>>> a
>>>>>>>> given field to a single entry in the result set. Site collapsing
is
>>>>>>>> a
>>>>>>>> special case of this, where all results for a given web site
is
>>>>>>>> collapsed
>>>>>>>> into one or two entries in the result set, typically with
an
>>>>>>>> associated
>>>>>>>> "more documents from this site" link. See also Duplicate
>>>>>>>> detection."
>>>>>>>> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
>>>>>>>> The implementation add 3 new query parameters (SolrParams):
>>>>>>>> "collapse.field" to choose the field used to group results
>>>>>>>> "collapse.type" normal (default value) or adjacent
>>>>>>>> "collapse.max" to select how many continuous results are
allowed
>>>>>>>> before
>>>>>>>> collapsing
>>>>>>>> TODO (in progress):
>>>>>>>> - More documentation (on source code)
>>>>>>>> - Test cases
>>>>>>>> Two patches:
>>>>>>>> - "field_collapsing.patch" for current development version
>>>>>>>> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
>>>>>>>> P.S.: Feedback and misspelling correction are welcome ;-)
>>>>>>>
>>>>>>> --
>>>>>>> This message is automatically generated by JIRA.
>>>>>>> -
>>>>>>> You can reply to this email to add a comment to the issue online.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://old.nabble.com/-jira--Created%3A-%28SOLR-236%29-Field-collapsing-tp10440315p26674651.html
>>>>>> Sent from the Solr - Dev mailing list archive at Nabble.com.
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Met vriendelijke groet,
>>>>>
>>>>> Martijn van Groningen
>>>>>
>>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://old.nabble.com/-jira--Created%3A-%28SOLR-236%29-Field-collapsing-tp10440315p26678606.html
>>>> Sent from the Solr - Dev mailing list archive at Nabble.com.
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Met vriendelijke groet,
>>>
>>> Martijn van Groningen
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/-jira--Created%3A-%28SOLR-236%29-Field-collapsing-tp10440315p26679037.html
>> Sent from the Solr - Dev mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> Met vriendelijke groet,
> 
> Martijn van Groningen
> 
> 

-- 
View this message in context: http://old.nabble.com/-jira--Created%3A-%28SOLR-236%29-Field-collapsing-tp10440315p26679520.html
Sent from the Solr - Dev mailing list archive at Nabble.com.


Mime
View raw message