lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Solr gives the same fieldnorm for two different-size fields
Date Thu, 31 Jul 2014 17:06:43 GMT
And it won't be <G>. Basically, the norms are an approximation (They used
to be just a byte long), so
fields of "close" lengths will have the same value here.

Why is this an issue? If you back up a second, is a word appearing in a
4-word field really "enough" more
important than one appearing in a 5 word field to require a distinction?

Lately you can specify field norms that are longer than a byte, but the
overall problem still remains.

Frankly, though, I think this is something that's a distraction and that
users won't notice.

FWIW,
Erick


On Thu, Jul 31, 2014 at 9:56 AM, gorjida <ali@sciencescape.net> wrote:

> I use solr for searching over a collection of institution names... My solr
> DB
> contains multiple field names such as name, country, city, .... A sample
> document looks like this:
>
> {
>         "solr_id": 130950,
>         "rg_id": 140239,
>         "rg_parent_id": 1438,
>         "name": "University of California Berkeley Research",
>         "ext_name": "",
>         "city": "Berkeley",
>         "country": "US",
>         "state": "CA",
>         "type": "academic/gen",
>         "ext_city": "",
>         "zip": "94720-5100",
>         "_version_": 1474909528315134000
>       },
>
> I need to search over this database... My query looks like this:
>
> name: (university of california berkeley)
>
> After running this query, top-2 matches are as follows:
>
> {
>         "solr_id": 130950,
>         "rg_id": 140239,
>         "rg_parent_id": 1438,
>         "name": "University of California Berkeley Research",
>         "ext_name": "",
>         "city": "Berkeley",
>         "country": "US",
>         "state": "CA",
>         "type": "academic/gen",
>         "ext_city": "",
>         "zip": "94720-5100",
>         "_version_": 1474909528315134000,
>         "score": 1.8849033
>       },
>       {
>         "solr_id": 350,
>         "rg_id": 1438,
>         "rg_parent_id": 1439,
>         "name": "University of California Berkeley",
>         "ext_name": "",
>         "city": "Berkeley",
>         "country": "US",
>         "state": "CA",
>         "type": "academic",
>         "ext_city": "",
>         "zip": "94720",
>         "_version_": 1474909520371122200,
>         "score": 1.8849033
>       },
>
> Indeed, both "University of California Berkeley Research" and "University
> of
> California Berkeley" get the same score (1.8849033)... FYI, my schema looks
> like this:
>
> fieldType name="text_general" class="solr.TextField" omitNorms="false"
> autoGeneratePhraseQueries="true">
>       <analyzer type="index">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="false"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="false"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>     </fieldType>
>
> I also checked the debugger and noticed that both documents return the same
> fieldnorm (.5)... The bizzare thing is that solr works fine for these
> queries:
> --- name: (university of toronto)
> --- name: (university of california los angeles)
>
> Indeed, it seems that solr fails once the number of tokens in the documents
> is equal to "4"... For above queries, the first one (university of toronto)
> has three tokens and the second one has 5 tokens... I am totally stuck at
> this point why solr cannot provide different fieldnorms for (University of
> California Berkeley) and (University of California Berkeley Research)...
> Also, I do not understand why it just happens when I have 4 tokens in the
> field? I would appreciate if anyone can share the feedback...
>
> PS. I have also tested "solr.StopFilterFactory" ignoreCase="true" and the
> problem is not still resolved...
>
> Regards,
>
> Ali
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-gives-the-same-fieldnorm-for-two-different-size-fields-tp4150418.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message