lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: RE : Shards don't return documents in same order
Date Fri, 02 May 2014 16:07:08 GMT
Francois:

Yes, there are several means to examine the raw terms in the index.
> The admin/schema-browser page
> TermsComponent: https://cwiki.apache.org/confluence/display/solr/The+Terms+Component
> Luke

the  schema-browser is all set up for you, it's easiest. The
TermsComponent should be directly usable too, I believe it's
configured by default in solrconfig.xml Luke takes a bit of setup but
is a great tool.

Did you re-index from scratch on all shards? I presume your ordering
is still not the same on all shards... the order I'd expect would be:
mb20140410a
mb20140410anew
mb20140411a

Best,
Erick


On Thu, May 1, 2014 at 8:27 AM, Francois Perron
<Francois.Perron@ticketmaster.com> wrote:
> Hi Erick,
>
>   thank you for your response.  You are right, I changed alphaOnlySort to keep lettres
and numbers and to remove some acticles (a, an, the).
>
> This is the filetype definition :
>
>     <fieldType name="alphaOnlySort" class="solr.TextField" sortMissingLast="true"
omitNorms="true">
>       <analyzer>
>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.TrimFilterFactory"/>
>         <filter class="solr.PatternReplaceFilterFactory" replace="all" replacement=""
pattern="(\b(a|an|the)\b|[^a-z,0-9])"/>
>       </analyzer>
>     </fieldType>
>
>
> Then, I tested each name with admin ui on each server and this is the results :
>
> server1
>
> MB20140410A = mb20140410a
> MB20140411A = mb20140411a
> MB20140410A-New = mb20140410anew
>
> server2
>
> MB20140410A = mb20140410a
> MB20140411A = mb20140411a
> MB20140410A-New = mb20140410anew
>
> server3
>
> MB20140410A = mb20140410a
> MB20140411A = mb20140411a
> MB20140410A-New = mb20140410anew
>
> "Unfortunately", all results are identical so is there a mean to view data real indexed
in these documents ?  Can be a problem with a particular server ?  All configs are in zookeeper
so all cores shouldhave the same config, right ?  Is there any way to force a replicat to
resynchronize ?
>
> Regards,
>
> Francois.
>
> ________________________________________
> De : Erick Erickson [erickerickson@gmail.com]
> Envoyé : 30 avril 2014 16:36
> À : solr-user@lucene.apache.org
> Objet : Re: Shards don't return documents in same order
>
> Hmmm, take a look at the admin/analysis page for these inputs for
> alphaOnlySort. If you're using the stock Solr distro, you're probably
> not considering the effects patternReplaceFilterFactory which is
> removing all non-letters. So these three terms reduce to
>
> mba
> mba
> mbanew
>
> You can look at the actual indexed terms by the admin/schema-browser as well.
>
> That said, unless you transposed the order because you were
> concentrating on the numeric part, the doc with MB20140410A-New should
> always be sorting last.
>
> All of which is irrelevant if you're doing something else with
> "alphaOnlySort", so please paste in the fieldType definition if you've
> changed it.
>
> What gets returned in the doc for _stored_ data is a verbatim copy,
> NOT the output of the analysis chain, which can be confusing.
>
> Oh, and Solr uses the internal lucene doc ID to break ties, and docs
> on different replicas can have different internal Lucene doc IDs
> relative to each other as a result of merging so that's something else
> to watch out for.
>
> Best,
> Erick
>
> On Wed, Apr 30, 2014 at 1:06 PM, Francois Perron
> <Francois.Perron@ticketmaster.com> wrote:
>> Hi guys,
>>
>>   I have a small SolrCloud setup (3 servers, 1 collection with 1 shard and 3 replicat).
 In my schema, I have a alphaOnlySort field with a copyfield.
>>
>> This is a part of my managed-schema :
>>
>>     <field name="_root_" type="string" indexed="true" stored="false"/>
>>     <field name="_uid" type="string" multiValued="false" indexed="true" required="true"
stored="true"/>
>>     <field name="_version_" type="long" indexed="true" stored="true"/>
>>     <field name="event_id" type="string" indexed="true" stored="true"/>
>>     <field name="event_name" type="text_general" indexed="true" stored="true"/>
>>     <field name="event_name_sort" type="alphaOnlySort"/>
>>
>> with the copyfield
>>
>>   <copyField source="event_name" dest="event_name_sort"/>
>>
>>
>> The problem is : I query my collection with a sort on my alphasort field but on one
of my servers, the sort order is not the same.
>>
>> On server 1 and 2, I have this result :
>>
>> <doc>
>> <str name="event_name">MB20140410A</str>
>> </doc>
>> <doc>
>> <str name="event_name">MB20140410A-New</str>
>> </doc>
>> <doc>
>> <str name="event_name">MB20140411A</str>
>> </doc>
>>
>>
>>
>> and on the third one, this :
>>
>> <str name="event_name">MB20140410A</str>
>> </doc>
>> <doc>
>> <str name="event_name">MB20140411A</str>
>> </doc>
>> <doc>
>> <str name="event_name">MB20140410A-New</str>
>> </doc>
>>
>>
>> The doc named "MB20140411A" should be at the end ...
>>
>> Any idea ?
>>
>> Regards

Mime
View raw message