lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gorjida <...@sciencescape.net>
Subject Solr gives the same fieldnorm for two different-size fields
Date Thu, 31 Jul 2014 16:56:31 GMT
I use solr for searching over a collection of institution names... My solr DB
contains multiple field names such as name, country, city, .... A sample
document looks like this:

{
        "solr_id": 130950,
        "rg_id": 140239,
        "rg_parent_id": 1438,
        "name": "University of California Berkeley Research",
        "ext_name": "",
        "city": "Berkeley",
        "country": "US",
        "state": "CA",
        "type": "academic/gen",
        "ext_city": "",
        "zip": "94720-5100",
        "_version_": 1474909528315134000
      },

I need to search over this database... My query looks like this:

name: (university of california berkeley)

After running this query, top-2 matches are as follows:

{
        "solr_id": 130950,
        "rg_id": 140239,
        "rg_parent_id": 1438,
        "name": "University of California Berkeley Research",
        "ext_name": "",
        "city": "Berkeley",
        "country": "US",
        "state": "CA",
        "type": "academic/gen",
        "ext_city": "",
        "zip": "94720-5100",
        "_version_": 1474909528315134000,
        "score": 1.8849033
      },
      {
        "solr_id": 350,
        "rg_id": 1438,
        "rg_parent_id": 1439,
        "name": "University of California Berkeley",
        "ext_name": "",
        "city": "Berkeley",
        "country": "US",
        "state": "CA",
        "type": "academic",
        "ext_city": "",
        "zip": "94720",
        "_version_": 1474909520371122200,
        "score": 1.8849033
      },

Indeed, both "University of California Berkeley Research" and "University of
California Berkeley" get the same score (1.8849033)... FYI, my schema looks
like this:

fieldType name="text_general" class="solr.TextField" omitNorms="false"
autoGeneratePhraseQueries="true">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="false"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="false"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>

I also checked the debugger and noticed that both documents return the same
fieldnorm (.5)... The bizzare thing is that solr works fine for these
queries:
--- name: (university of toronto)
--- name: (university of california los angeles)

Indeed, it seems that solr fails once the number of tokens in the documents
is equal to "4"... For above queries, the first one (university of toronto)
has three tokens and the second one has 5 tokens... I am totally stuck at
this point why solr cannot provide different fieldnorms for (University of
California Berkeley) and (University of California Berkeley Research)...
Also, I do not understand why it just happens when I have 4 tokens in the
field? I would appreciate if anyone can share the feedback...

PS. I have also tested "solr.StopFilterFactory" ignoreCase="true" and the
problem is not still resolved...

Regards,

Ali



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-gives-the-same-fieldnorm-for-two-different-size-fields-tp4150418.html
Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message