lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Günter Hipler <vogese...@gmail.com>
Subject Re: use of filter queries in Lucene/Solr Alpha40 and Beta4.0
Date Mon, 03 Sep 2012 06:58:11 GMT
I made more tests with the Lucene/SOLR 4.0 version deployed in March and
the latest Lucene 4.0 beta version over the weekend.


My findings:

- the version deployed in march doesn't contain the error I now come across
in Beta4.0 (The number of documents part of the facetcounts differs
 from the real number of documents in a subsequent drill-down request using
a filter query)
This is true even in case a lot of updates were done against the index
At the moment this can be seen under
http://sb-tp1.swissbib.unibas.ch/(e.g. with the term 'mitbestimmung'
and the facet value  'nebis I used for
all my tests)
As a note: because we have to migrate the OS of our servers the host might
be down in the course of the current week for one or two days.

- using the latest Lucene/Solr Beta version, the error occurs when updates
are committed against the index as I described it in my former messages.
When the index is new and freshly built the error doesn't occur (I made
these tests on a host which is not accessible for the public)

>From my point of view this is a severe bug in Lucene/Solr Beta 4.0 because
filter queries are used very, very often!

I would be very happy if someone of the SOLR core team could comment it.

Thanks a lot for support!

Günter Hipler

2012/8/31 Günter Hipler <vogesen61@gmail.com>

>
> Hi,
>
> thanks for your responses!
>
> I made a more simple query with only one facet and without any boosting
> stuff so it should be easier to focus the problem
>
>
> facet=on&facet.mincount=1&facet.limit=100&rows=0&start=0&q=+(+%2Bmitbestimmung++)+&facet.field=navNetwork&qt=only_queryfields_edismax&debugQuery=true
> ->
> facet=on&
> facet.mincount=1&
> facet.limit=100&
> rows=0&
> start=0&
> q=+(+%2Bmitbestimmung++)+&
> facet.field=navNetwork&
> qt=only_queryfields_edismax&
> debugQuery=true
>
> facet counts say 2734 documents for nebis
> parsedQuery
> (+(+DisjunctionMaxQuery((title_series:mitbestimmung |
> title_uniform:mitbestimmung | authorfull:mitbestimmung |
> callnum:mitbestimmung | sfulltext:mitbestimmung | title_short:mitbestimmung
> | sbranchlib:mitbestimmung | bibid:mitbestimmung |
> sfullTextRemoteData:mitbestimmung | title_long:mitbestimmung |
> autnum:mitbestimmung | subfull:mitbestimmung |
> publplace:mitbestimmung))))/no_coord
> parsedQuery_toString
> +(+(title_series:mitbestimmung | title_uniform:mitbestimmung |
> authorfull:mitbestimmung | callnum:mitbestimmung | sfulltext:mitbestimmung
> | title_short:mitbestimmung | sbranchlib:mitbestimmung |
> bibid:mitbestimmung | sfullTextRemoteData:mitbestimmung |
> title_long:mitbestimmung | autnum:mitbestimmung | subfull:mitbestimmung |
> publplace:mitbestimmung))
>
>
>
> facet=on&facet.mincount=1&facet.limit=100&rows=0&start=0&q=+(+%2Bmitbestimmung++)+&facet.field=navNetwork&qt=only_queryfields_edismax&debugQuery=true&fq={!term+f%3DnavNetwork}nebis
> ->
> facet=on&facet.mincount=1&
> facet.limit=100&
> rows=0&
> start=0&
> q=+(+%2Bmitbestimmung++)+&
> facet.field=navNetwork&
> qt=only_queryfields_edismax&
> debugQuery=true&
> fq={!term+f%3DnavNetwork}nebis
>
> delivers 2871 (not the same as the number indicated in the base query)
> What is interesting:
> the facetcount of the second query itself shows the 'correct' number
> indicated in the base query (2734)
>
> parsedQuery and parsedQuery_ToString same as in base query
> @Jack: and is exactly the same for a filter query with fq=navNetwork:nebis
> we are using the term query parser to overcome problems with escaping
> special characters (as it is also described in the
> Solr Enterprise Search server book on page 189)
>
>
> Using the alternatives suggested by Hoss
>
> http://sb-s7.swissbib.unibas.ch:8080/solr/collection1/select?facet=on&facet.mincount=1&facet.limit=100&rows=0&start=0&q=+%28+%2Bmitbestimmung++%29+&facet.field=navNetwork&qt=only_queryfields_edismax&debugQuery=true&fq={!raw%20f=navNetwork}nebis<http://sb-s7.swissbib.unibas.ch:8080/solr/collection1/select?facet=on&facet.mincount=1&facet.limit=100&rows=0&start=0&q=+%28+%2Bmitbestimmung++%29+&facet.field=navNetwork&qt=only_queryfields_edismax&debugQuery=true&fq=%7B!raw%20f=navNetwork%7Dnebis>
> and
>
> facet=on&facet.mincount=1&facet.limit=100&rows=0&start=0&q=+(+%2Bmitbestimmung++)+&facet.field=navNetwork&qt=only_queryfields_edismax&debugQuery=true&fq={!lucene}navNetwork:nebis
> don't change the result. The number of returned documents is higher than
> it should be related to the number of facets in the facet counts displayed
> in the base query
>
>
> the type we are using for navNetwork:
>  <field name="navNetwork" type="stdID" multiValued="true" stored="false" />
> <!-- text field type for IDs of all sorts and colors, generic usage
> (20.03.2012/osc) -->
>    <fieldType name="stdID" class="solr.TextField" sortMissingLast="true"
> omitNorms="true">
>       <analyzer>
>          <tokenizer class="solr.KeywordTokenizerFactory" />
>          <filter class="solr.LowerCaseFilterFactory" />
>          <filter class="solr.PatternReplaceFilterFactory"
>             pattern="^(\([a-z]+\))vtls0"
>             replacement="$10"
>             replace="all"
>          />
>          <filter class="solr.PatternReplaceFilterFactory"
>             pattern="[^\w]+"
>             replacement=""
>             replace="all"
>          />
>          <filter class="solr.TrimFilterFactory" />
>          <filter class="solr.LengthFilterFactory" min="2" max="100" />
>       </analyzer>
>    </fieldType>
>
>
> which in my opinion should be a common treatment for facet types
>
> the new requestHandler I'm using is quite simple (without any boosting and
> other stuff as it is done in the original one):
>    <requestHandler default="true" name="only_queryfields_edismax"
> class="solr.SearchHandler">
>       <lst name="defaults">
>         <!-- use the extended dismax query parser -->
>         <str name="defType">edismax</str>
>         <str name="echoParams">explicit</str>
>         <str name="qf">
>           title_long title_short title_uniform title_series authorfull
>           publplace subfull sfulltext sfullTextRemoteData syear bibid
>           sbranchlib callnum autnum
>         </str>
>       </lst>
>     </requestHandler>
>
>
> What I try to do as next as soon as possible:
> - I'm going to setup a new index with the Lucene 4.0 version from March
> (to be more exactly: it's version 4.0-2012-03-09_11-29-20)
> to see what are the results even in case of frequent updates
>
> - setup a 'new' index with Lucene beta4 (without any updates) and to test
> more thoroughly if I get the same not consistent results (as it is
> currently after updating the index)
>
>
> Thanks a lot for your support!
>
> Günter
>
>
>
>
>
> 2012/8/30 Chris Hostetter <hossman_lucene@fucit.org>
>
>>
>> The "q" and "bq" params have changed slightly between your first query and
>> the query where you add the "fq" param ... because of how "bq" is
>> additively added to the main query, it's possible this difference may
>> account for the behavior your are seeing -- double check the debugQuery
>> output for your main query between teh two requests to see if they match
>> up.  Heck: you can try the second query w/o the "fq" and sanity check that
>> it still matches the same number of docs as the first query.
>>
>> If that's working fine, can you please give us more info about your
>> "navNetwork" field, how is it configured?
>>
>> if you could show us the debugQuery output and numFound for these simple
>> queries (no special requestHandler settings please) that would also be
>> helpful..
>>
>>         /select?q={!raw f=navNetwork}nebis
>>         /select?q={!term f=navNetwork}nebis
>>         /select?q={!lucene}navNetwork:nebis
>>
>>
>> : My query against an index is (I leaved out some of the facet fields)
>> : f.navBranchlib.facet.limit=1000&
>> : facet=on&facet.mincount=1&
>> : facet.limit=100&
>> : bq=navBranchlib:A100^1000&
>> : bq=navBranchlib:UFSW^1000&
>> : start=0&q=+(+%2Bmitbestimmung++)+&
>> : facet.field=navNetwork&
>> : qt=sb-bbfull-01
>> : -> qt refers to an edismax query-parser
>> :
>> : I get a result for the navNetwork facets which looks like
>> :
>> : <lst name="navNetwork">
>> : <int name="ids">3810</int>
>> : <int name="nebis">2732</int>
>> : <int name="idsbb">1945</int>
>> : </lst>
>> :
>> : using a fq Parameter to drill down against the navNetwork facets
>> : facet=on&facet.mincount=1&
>> : facet.limit=100&
>> : q=(+(+%2Bmitbestimmung++)+)&
>> : facet.field=navNetwork&
>> : qt=sb-bbfull-01&
>> : fq={!term+f%3DnavNetwork}nebis
>> : delivers 2806 Documents - instead of the expected 2732
>> :
>> :
>> : A boolean query instead of the fq is providing the correct result of
>> 2732
>> : documents
>> : facet=on&facet.mincount=1&
>> : facet.limit=100&
>> : %2Bmitbestimmung+%2BnavNetwork:nebis&
>> : facet.field=navNetwork&
>> : qt=sb-bbfull-01&
>> :
>> :
>> :
>> : The behaviour is not consistent. Some of the facets provide the correct
>> : result, some not.
>> : What I can't say for sure: The behaviour was correct (if I'm not wrong)
>> : once the whole index was newly created. After running
>> : some updates I got these results.
>> : The application reflecting this behaviour is available under:
>> : http://sb-tp1.swissbib.unibas.ch
>> :
>> : We are using Lucene/SOLR since the end of last year and deployed
>> regularly
>> : the various nightly builds.
>> : The last version this error(?) didn't appear is from March 2012. The
>> : application using it is available under
>> : http://baselbern.swissbib.ch
>> : The target "books and more" is using the Lucene 4.0 march version. The
>> : index is being updated several times a day and uses the same
>> : filter queries as for Lucene/SOLR 4.0 beta and alpha.
>> :
>> : My question:
>> : - has something changed in the last versions or is this a bug?
>> :
>> : Günter Hipler
>> :
>>
>> -Hoss
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message