lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Webster Homer <webster.ho...@sial.com>
Subject Re: Consecutive calls to a query give different results
Date Thu, 07 Sep 2017 15:08:57 GMT
the scores are not the same
Doc
305340 432.44238
C2646     428.24185
12837     430.61722

One other thing. I just ran optimize and now document 305340 is
consistently the top score.
So apparently it IS essential to run optimize after a data load

Note we see this behavior fairly commonly on our solr cloud instances. This
was not the first time. This particular situation was on a development
system

On Thu, Sep 7, 2017 at 10:04 AM, Webster Homer <webster.homer@sial.com>
wrote:

> the scores are not the same
> Doc
> 305340 432.44238
>
> On Thu, Sep 7, 2017 at 10:02 AM, David Hastings <
> hastings.recursive@gmail.com> wrote:
>
>> "I am concerned that the same
>> search gives different results after each search. The top document seems
>> to
>> cycle between 3 different documents"
>>
>>
>> if you do debug query on the search, are the scores for the top 3
>> documents
>> the same or not?  you can easily have three documents with the same score,
>> so when you have a result set that is ranked 1-1-1-2-3-4.... you can
>> expect
>> 1-1-1 to rotate based on whatever.  use a second element like id to your
>> ranking perhaps.
>>
>>
>>
>>
>> On Thu, Sep 7, 2017 at 10:54 AM, Webster Homer <webster.homer@sial.com>
>> wrote:
>>
>> > I am not concerned about deleted documents. I am concerned that the same
>> > search gives different results after each search. The top document
>> seems to
>> > cycle between 3 different documents
>> >
>> > I have an enhanced collections info api call that calls the core admin
>> api
>> > to get the index information for the replica.
>> > When I said the numdocs were the same I meant exactly that. maxdocs and
>> > deleted documents are not the same for the replicas, but the number of
>> > numdocs is.
>> >
>> > Or are you saying that the search is looking at deleted documents
>> wouldn't
>> > that be a very significant bug?
>> >
>> > The four replicas:
>> > shard1
>> > core_node1
>> > "numDocs": 383817,
>> > "maxDocs": 611592,
>> > "deletedDocs": 227775,
>> > "size": "2.49 GB",
>> > "lastModified": "2017-09-07T08:18:03.639Z",
>> > "current": true,
>> > "version": 35644,
>> > "segmentCount": 28
>> >
>> > core_node3
>> > "numDocs": 383817,
>> > "maxDocs": 571737,
>> > "deletedDocs": 187920,
>> > "size": "2.85 GB",
>> > "lastModified": "2017-09-07T08:18:03.634Z",
>> > "current": false,
>> > "version": 35562,
>> > "segmentCount": 36
>> > shard2
>> > core_node2
>> > "numDocs": 385326,
>> > "maxDocs": 529214,
>> > "deletedDocs": 143888,
>> > "size": "2.13 GB",
>> > "lastModified": "2017-09-07T08:18:03.632Z",
>> > "current": true,
>> > "version": 34783,
>> > "segmentCount": 24
>> > core_node4
>> > "numDocs": 385326,
>> > "maxDocs": 488201,
>> > "deletedDocs": 102875,
>> > "size": "1.96 GB",
>> > "lastModified": "2017-09-07T08:18:03.633Z",
>> > "current": true,
>> > "version": 34932,
>> > "segmentCount": 21
>> >
>> >
>> > On Thu, Sep 7, 2017 at 7:58 AM, Yonik Seeley <yseeley@gmail.com> wrote:
>> >
>> > > On Thu, Sep 7, 2017 at 12:47 AM, Erick Erickson <
>> erickerickson@gmail.com
>> > >
>> > > wrote:
>> > > > bq: and deleted documents are irrelevant to term statistics...
>> > > >
>> > > > Did you mean "relevant"? Or do I have to adjust my thinking _again_?
>> > >
>> > > One can make it work either way ;-)
>> > > Whether a document is marked as deleted or not has no effect on term
>> > > statistics (i.e. irrelevant)
>> > > OR documents marked for deletion still count in term statistics (i.e.
>> > > relevant)
>> > >
>> > > I guess I used the former because we don't go out of our way to still
>> > > include deleted documents... it's just a side effect of the index
>> > > structure that we don't (and can't easily) update statistics when a
>> > > document is marked as deleted.
>> > >
>> > > -Yonik
>> > >
>> > >
>> > > > Erick
>> > > >
>> > > > On Wed, Sep 6, 2017 at 7:48 PM, Yonik Seeley <yseeley@gmail.com>
>> > wrote:
>> > > >> Different replicas of the same shard can have different numbers
of
>> > > >> deleted documents (really just marked as deleted), and deleted
>> > > >> documents are irrelevant to term statistics (like the number of
>> > > >> documents a term appears in).  Documents marked for deletion stop
>> > > >> contributing to corpus statistics when they are actually removed
>> (via
>> > > >> expunge deletes, merges, optimizes).
>> > > >> -Yonik
>> > > >>
>> > > >>
>> > > >> On Wed, Sep 6, 2017 at 5:51 PM, Webster Homer <
>> webster.homer@sial.com
>> > >
>> > > wrote:
>> > > >>> I am using Solr 6.2.0 configured as a solr cloud with 2 shards
>> and 4
>> > > >>> replicas (total of 4 nodes).
>> > > >>>
>> > > >>> If I run the query multiple times I see the three different
top
>> > scoring
>> > > >>> results.
>> > > >>> No data load is running, all data has been commited
>> > > >>>
>> > > >>> I get these three different hits with their scores:
>> > > >>> copperiinitratehemipentahydrate2325919004194        430.61722
>> > > >>> copperiinitrateoncelite1234598765
>> > >  432.44238
>> > > >>> copperiinitratehydrate18756anhydrousbasis13778319 428.24185
>> > > >>>
>> > > >>> How is it that the same search against the same data can give
>> > different
>> > > >>> responses?
>> > > >>> I looked at the specific cores they look OK the numdocs for
the
>> > > replicas in
>> > > >>> a shard match
>> > > >>>
>> > > >>> This is the query:
>> > > >>> http://ae1c-ecomdev-msc01.sial.com:8983/solr/sial-
>> > > catalog-product/select?defType=edismax&fl=searchmv_
>> > > en_keywords,%20searchmv_keywords,searchmv_pno,%
>> > 20searchmv_en_s_pri_name,%
>> > > 20search_en_p_pri_name,%20search_pno%20[explain%
>> > > 20style=nl]&group.field=id_s&group.limit=30&group=true&
>> > > group.sort=sort_ds%20asc&indent=on&mm=2%3C-25%25&q.op=
>> > > OR&q=copper%20nitrate&qf=search_pid
>> > > >>> ^500%20search_concat_pno^400%20searchmv_concat_sku^400%
>> > > 20searchmv_pno^300%20search_concat_pno_genr^100%20searchmv_pno_genr%
>> > > 20searchmv_p_skus_genr%20searchmv_user_term^200%
>> > > 20search_lform^190%20searchmv_en_acronym^180%20search_en_
>> > > root_name^170%20searchmv_en_s_pri_name^160%20search_en_p_
>> > > pri_name^150%20searchmv_en_synonyms^145%20searchmv_en_
>> > > keywords^140%20search_en_sortkey^120%20searchmv_p_skus^
>> > > 100%20searchmv_chem_comp^90%20searchmv_en_name_suf%
>> > > 20searchmv_cas_number^80%20searchmv_component_cas^70%
>> > > 20search_beilstein^50%20search_color_idx^40%
>> > 20search_ecnumber^30%20search_
>> > > egecnumber^30%20search_femanumber^20%20searchmv_isbn^
>> > > 10%20search_mdl_number%20searchmv_en_page_title%
>> > > 20searchmv_en_descriptions%20searchmv_en_attributes%
>> > > 20searchmv_rtecs%20searchmv_lookahead_terms%20searchmv_
>> > > xref_comparable_pno%20searchmv_xref_comparable_sku%20searchmv_xref_
>> > > equivalent_pno%20searchmv_xref_exact_pno%20searchmv_
>> > > xref_exact_sku%20searchmv_component_molform&rows=30&
>> > > sort=score%20desc,sort_en_name%20asc,sort_ds%20asc,
>> > > search_pid%20asc&wt=json
>> > > >>>
>> > > >>> --
>> > > >>>
>> > > >>>
>> > > >>> This message and any attachment are confidential and may be
>> > privileged
>> > > or
>> > > >>> otherwise protected from disclosure. If you are not the intended
>> > > recipient,
>> > > >>> you must not copy this message or attachment or disclose the
>> contents
>> > > to
>> > > >>> any other person. If you have received this transmission in
error,
>> > > please
>> > > >>> notify the sender immediately and delete the message and any
>> > attachment
>> > > >>> from your system. Merck KGaA, Darmstadt, Germany and any of
its
>> > > >>> subsidiaries do not accept liability for any omissions or
errors
>> in
>> > > this
>> > > >>> message which may arise as a result of E-Mail-transmission
or for
>> > > damages
>> > > >>> resulting from any unauthorized changes of the content of
this
>> > message
>> > > and
>> > > >>> any attachment thereto. Merck KGaA, Darmstadt, Germany and
any of
>> its
>> > > >>> subsidiaries do not guarantee that this message is free of
viruses
>> > and
>> > > does
>> > > >>> not accept liability for any damages caused by any virus
>> transmitted
>> > > >>> therewith.
>> > > >>>
>> > > >>> Click http://www.emdgroup.com/disclaimer to access the German,
>> > French,
>> > > >>> Spanish and Portuguese versions of this disclaimer.
>> > >
>> >
>> > --
>> >
>> >
>> > This message and any attachment are confidential and may be privileged
>> or
>> > otherwise protected from disclosure. If you are not the intended
>> recipient,
>> > you must not copy this message or attachment or disclose the contents to
>> > any other person. If you have received this transmission in error,
>> please
>> > notify the sender immediately and delete the message and any attachment
>> > from your system. Merck KGaA, Darmstadt, Germany and any of its
>> > subsidiaries do not accept liability for any omissions or errors in this
>> > message which may arise as a result of E-Mail-transmission or for
>> damages
>> > resulting from any unauthorized changes of the content of this message
>> and
>> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
>> > subsidiaries do not guarantee that this message is free of viruses and
>> does
>> > not accept liability for any damages caused by any virus transmitted
>> > therewith.
>> >
>> > Click http://www.emdgroup.com/disclaimer to access the German, French,
>> > Spanish and Portuguese versions of this disclaimer.
>> >
>>
>
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message