lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Hastings <hastings.recurs...@gmail.com>
Subject Re: Consecutive calls to a query give different results
Date Thu, 07 Sep 2017 15:02:20 GMT
"I am concerned that the same
search gives different results after each search. The top document seems to
cycle between 3 different documents"


if you do debug query on the search, are the scores for the top 3 documents
the same or not?  you can easily have three documents with the same score,
so when you have a result set that is ranked 1-1-1-2-3-4.... you can expect
1-1-1 to rotate based on whatever.  use a second element like id to your
ranking perhaps.




On Thu, Sep 7, 2017 at 10:54 AM, Webster Homer <webster.homer@sial.com>
wrote:

> I am not concerned about deleted documents. I am concerned that the same
> search gives different results after each search. The top document seems to
> cycle between 3 different documents
>
> I have an enhanced collections info api call that calls the core admin api
> to get the index information for the replica.
> When I said the numdocs were the same I meant exactly that. maxdocs and
> deleted documents are not the same for the replicas, but the number of
> numdocs is.
>
> Or are you saying that the search is looking at deleted documents wouldn't
> that be a very significant bug?
>
> The four replicas:
> shard1
> core_node1
> "numDocs": 383817,
> "maxDocs": 611592,
> "deletedDocs": 227775,
> "size": "2.49 GB",
> "lastModified": "2017-09-07T08:18:03.639Z",
> "current": true,
> "version": 35644,
> "segmentCount": 28
>
> core_node3
> "numDocs": 383817,
> "maxDocs": 571737,
> "deletedDocs": 187920,
> "size": "2.85 GB",
> "lastModified": "2017-09-07T08:18:03.634Z",
> "current": false,
> "version": 35562,
> "segmentCount": 36
> shard2
> core_node2
> "numDocs": 385326,
> "maxDocs": 529214,
> "deletedDocs": 143888,
> "size": "2.13 GB",
> "lastModified": "2017-09-07T08:18:03.632Z",
> "current": true,
> "version": 34783,
> "segmentCount": 24
> core_node4
> "numDocs": 385326,
> "maxDocs": 488201,
> "deletedDocs": 102875,
> "size": "1.96 GB",
> "lastModified": "2017-09-07T08:18:03.633Z",
> "current": true,
> "version": 34932,
> "segmentCount": 21
>
>
> On Thu, Sep 7, 2017 at 7:58 AM, Yonik Seeley <yseeley@gmail.com> wrote:
>
> > On Thu, Sep 7, 2017 at 12:47 AM, Erick Erickson <erickerickson@gmail.com
> >
> > wrote:
> > > bq: and deleted documents are irrelevant to term statistics...
> > >
> > > Did you mean "relevant"? Or do I have to adjust my thinking _again_?
> >
> > One can make it work either way ;-)
> > Whether a document is marked as deleted or not has no effect on term
> > statistics (i.e. irrelevant)
> > OR documents marked for deletion still count in term statistics (i.e.
> > relevant)
> >
> > I guess I used the former because we don't go out of our way to still
> > include deleted documents... it's just a side effect of the index
> > structure that we don't (and can't easily) update statistics when a
> > document is marked as deleted.
> >
> > -Yonik
> >
> >
> > > Erick
> > >
> > > On Wed, Sep 6, 2017 at 7:48 PM, Yonik Seeley <yseeley@gmail.com>
> wrote:
> > >> Different replicas of the same shard can have different numbers of
> > >> deleted documents (really just marked as deleted), and deleted
> > >> documents are irrelevant to term statistics (like the number of
> > >> documents a term appears in).  Documents marked for deletion stop
> > >> contributing to corpus statistics when they are actually removed (via
> > >> expunge deletes, merges, optimizes).
> > >> -Yonik
> > >>
> > >>
> > >> On Wed, Sep 6, 2017 at 5:51 PM, Webster Homer <webster.homer@sial.com
> >
> > wrote:
> > >>> I am using Solr 6.2.0 configured as a solr cloud with 2 shards and
4
> > >>> replicas (total of 4 nodes).
> > >>>
> > >>> If I run the query multiple times I see the three different top
> scoring
> > >>> results.
> > >>> No data load is running, all data has been commited
> > >>>
> > >>> I get these three different hits with their scores:
> > >>> copperiinitratehemipentahydrate2325919004194        430.61722
> > >>> copperiinitrateoncelite1234598765
> >  432.44238
> > >>> copperiinitratehydrate18756anhydrousbasis13778319 428.24185
> > >>>
> > >>> How is it that the same search against the same data can give
> different
> > >>> responses?
> > >>> I looked at the specific cores they look OK the numdocs for the
> > replicas in
> > >>> a shard match
> > >>>
> > >>> This is the query:
> > >>> http://ae1c-ecomdev-msc01.sial.com:8983/solr/sial-
> > catalog-product/select?defType=edismax&fl=searchmv_
> > en_keywords,%20searchmv_keywords,searchmv_pno,%
> 20searchmv_en_s_pri_name,%
> > 20search_en_p_pri_name,%20search_pno%20[explain%
> > 20style=nl]&group.field=id_s&group.limit=30&group=true&
> > group.sort=sort_ds%20asc&indent=on&mm=2%3C-25%25&q.op=
> > OR&q=copper%20nitrate&qf=search_pid
> > >>> ^500%20search_concat_pno^400%20searchmv_concat_sku^400%
> > 20searchmv_pno^300%20search_concat_pno_genr^100%20searchmv_pno_genr%
> > 20searchmv_p_skus_genr%20searchmv_user_term^200%
> > 20search_lform^190%20searchmv_en_acronym^180%20search_en_
> > root_name^170%20searchmv_en_s_pri_name^160%20search_en_p_
> > pri_name^150%20searchmv_en_synonyms^145%20searchmv_en_
> > keywords^140%20search_en_sortkey^120%20searchmv_p_skus^
> > 100%20searchmv_chem_comp^90%20searchmv_en_name_suf%
> > 20searchmv_cas_number^80%20searchmv_component_cas^70%
> > 20search_beilstein^50%20search_color_idx^40%
> 20search_ecnumber^30%20search_
> > egecnumber^30%20search_femanumber^20%20searchmv_isbn^
> > 10%20search_mdl_number%20searchmv_en_page_title%
> > 20searchmv_en_descriptions%20searchmv_en_attributes%
> > 20searchmv_rtecs%20searchmv_lookahead_terms%20searchmv_
> > xref_comparable_pno%20searchmv_xref_comparable_sku%20searchmv_xref_
> > equivalent_pno%20searchmv_xref_exact_pno%20searchmv_
> > xref_exact_sku%20searchmv_component_molform&rows=30&
> > sort=score%20desc,sort_en_name%20asc,sort_ds%20asc,
> > search_pid%20asc&wt=json
> > >>>
> > >>> --
> > >>>
> > >>>
> > >>> This message and any attachment are confidential and may be
> privileged
> > or
> > >>> otherwise protected from disclosure. If you are not the intended
> > recipient,
> > >>> you must not copy this message or attachment or disclose the contents
> > to
> > >>> any other person. If you have received this transmission in error,
> > please
> > >>> notify the sender immediately and delete the message and any
> attachment
> > >>> from your system. Merck KGaA, Darmstadt, Germany and any of its
> > >>> subsidiaries do not accept liability for any omissions or errors in
> > this
> > >>> message which may arise as a result of E-Mail-transmission or for
> > damages
> > >>> resulting from any unauthorized changes of the content of this
> message
> > and
> > >>> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > >>> subsidiaries do not guarantee that this message is free of viruses
> and
> > does
> > >>> not accept liability for any damages caused by any virus transmitted
> > >>> therewith.
> > >>>
> > >>> Click http://www.emdgroup.com/disclaimer to access the German,
> French,
> > >>> Spanish and Portuguese versions of this disclaimer.
> >
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message