lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: no search results for specific search in solr 6.6.0
Date Wed, 20 Sep 2017 17:09:39 GMT
Just go to the admin/analysis page and enter the terms in the "index"
box (I usually uncheck the "verbose" checkbox). You will see exactly
what element in your analysis chain is doing this. You'll see light
gray two-letter codes on the size, e.g. "ST". Hover over it with your
mouse, and you should see exactly what the class and thus the
easily-identifiable element of your fieldType for the field in
question. For instance:

solr.StandardTokenizerFactory

text_general may have fixed _this_ problem, but it's not a great
solution. The french analysis chain is tuned to create a better
solution for, well, french. Likely solr.FrenchLightStemFilterFactory
is removing the last "o", but that's a guess.

In general, stemming is incompatible with wildcards. E.g. "running"
stems to "run", but "runni*" has no real algorithm that can stem.

Best,
Erick

On Wed, Sep 20, 2017 at 5:18 AM, Sascha Tuschinski
<stuschinski@canto.com> wrote:
> Hello Erik and Josh,
>
> Thanks for your hints and comments.
>
> I found out that the “text_fr” field type didn’t stored the “fraoo” as term.
It stored “frao” only. Maybe because of French field type. This field had been automatically
created. I’m new to Solr and this is maybe correct.
>
> I use “text_general” as field type now and this works fine. This is fine and solve
our problem.
>
> I can deliver the output of the debug query from admin/analysis for the text_fr field
type if required.
>
> Thanks again!
> Sascha
>
>
> Am 19.09.17, 20:12 schrieb "Erick Erickson" <erickerickson@gmail.com>:
>
>     Unfortunately the link you provided goes to "localhost", which isn't accessible.
>
>     The very first thing I'd do is go to the admin/analysis page and put
>     the terms in both the "index" and "query" boxes for the field in
>     question.
>     Next, attach &debug=query to the query to see how the query is actually parsed.
>
>     My bet: You are using a different stemmer for the two cases and the
>     actual token in the index is FRao in the problem field, but that's
>     just a guess.
>
>     It often fools people that the field returned in the document (i.e. in
>     the fl list) is the _stored_ value, not the actual token in the index.
>     You can also use the TermsComponent to see the actual terms in the
>     index as well as the admin/schema_browser link.
>
>     Best,
>     Erick
>
>
>     On Tue, Sep 19, 2017 at 9:01 AM, Sascha Tuschinski
>     <stuschinski@canto.com> wrote:
>     > Hello Community,
>     >
>     > We are using a Solr Core with Solr 6.6.0 on Windows 10 (latest updates) with
field names defined like "f_1179014266_txt". The number in the middle of the name differs
for each field we use. For language specific fields we are adding an language specific extension
e.g. "f_1179014267_txt_fr", "f_1179014268_txt_de", "f_1179014269_txt_en" and so on.
>     > We are having the following odd issue within the french "_fr" field only:
>     > Field
>     > f_1197829835_txt_fr<http://localhost:8983/solr/#/test_core/schema?field=f_1197829835_txt_fr>
>     > Dynamic Field /
>     > *_txt_fr<http://localhost:8983/solr/#/test_core/schema?dynamic-field=*_txt_fr>
>     > Type
>     > text_fr<http://localhost:8983/solr/#/test_core/schema?type=text_fr>
>     >
>     >   *   The saved value which had been added with no problem to the Solr index
is "FRaoo".
>     >   *   When searching within the Solr query tool for "f_1197829839_txt_fr:*FRao*"
it returns the items matching the term as seen below - OK.
>     > {
>     >   "responseHeader":{
>     >     "status":0,
>     >     "QTime":1,
>     >     "params":{
>     >       "q":"f_1197829839_txt_fr:*FRao*",
>     >       "indent":"on",
>     >       "wt":"json",
>     >       "_":"1505808887827"}},
>     >   "response":{"numFound":1,"start":0,"docs":[
>     >       {
>     >         "id":"129",
>     >         "f_1197829834_txt_en":"EnAir",
>     >         "f_1197829822_txt_de":"Lufti",
>     >         "f_1197829835_txt_fr":"FRaoi",
>     >         "f_1197829836_txt_it":"ITAir",
>     >         "f_1197829799_txt":["Lufti"],
>     >         "f_1197829838_txt_en":"EnAir",
>     >         "f_1197829839_txt_fr":"FRaoo",
>     >         "f_1197829840_txt_it":"ITAir",
>     >         "_version_":1578520424165146624}]
>     >   }}
>     >
>     >   *   When searching for "f_1197829839_txt_fr:*FRaoo*" NO item is found - Wrong!
>     > {
>     >   "responseHeader":{
>     >     "status":0,
>     >     "QTime":1,
>     >     "params":{
>     >       "q":"f_1197829839_txt_fr:*FRaoo*",
>     >       "indent":"on",
>     >       "wt":"json",
>     >       "_":"1505808887827"}},
>     >   "response":{"numFound":0,"start":0,"docs":[]
>     >   }}
>     > When searching for "f_1197829839_txt_fr:FRaoo" (no wildcards) the matching items
are found - OK
>     >
>     > {
>     >   "responseHeader":{
>     >     "status":0,
>     >     "QTime":1,
>     >     "params":{
>     >       "q":"f_1197829839_txt_fr:FRaoo",
>     >       "indent":"on",
>     >       "wt":"json",
>     >       "_":"1505808887827"}},
>     >   "response":{"numFound":1,"start":0,"docs":[
>     >       {
>     >         "id":"129",
>     >         "f_1197829834_txt_en":"EnAir",
>     >         "f_1197829822_txt_de":"Lufti",
>     >         "f_1197829835_txt_fr":"FRaoi",
>     >         "f_1197829836_txt_it":"ITAir",
>     >         "f_1197829799_txt":["Lufti"],
>     >         "f_1197829838_txt_en":"EnAir",
>     >         "f_1197829839_txt_fr":"FRaoo",
>     >         "f_1197829840_txt_it":"ITAir",
>     >         "_version_":1578520424165146624}]
>     >   }}
>     > If we save exact the same value into a different language field e.g. ending
on "_en", means "f_1197829834_txt_en", then the search "f_1197829834_txt_en:*FRaoo*" find
all items correctly!
>     > We have no idea what's wrong here and we even recreated the index and can reproduce
this problem all the time. I can only see that the value starts with "FR" and the field extension
ends with "fr" but this is not problem for "en", "de" an so on. All fields are used in the
same way and have the same field properties.
>     > Any help or ideas are highly appreciated. I filed a bug for this https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSOLR-11367&data=01%7C01%7Cstuschinski%40canto.com%7C30fde63fe5fa4970052308d4ff8a01eb%7Cd477bdd2a39b47d0aa1bc2bd3de94562%7C0&sdata=zXo0TiIgSBRiqBXpCJESBSSD0RHtcoiQ2zv%2FkITyTeA%3D&reserved=0
but had been asked to publish my question here. Thanks for reading.
>     > Greetings,
>     > _______________________________________________________________
>     > Sascha Tuschinski
>     > Manager Quality Assurance // Canto GmbH
>     > Phone: +49 (0) 30 ­ 390 485 - 41
>     > E-mail: stuschinski@canto.com<mailto:stuschinski@canto.com>
>     > Web: canto.com<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.canto.com%2F&data=01%7C01%7Cstuschinski%40canto.com%7C30fde63fe5fa4970052308d4ff8a01eb%7Cd477bdd2a39b47d0aa1bc2bd3de94562%7C0&sdata=7Yu3mA2BaIBEbDJoJekBQvY%2Fgh0caXjA2kWvoOqj8NI%3D&reserved=0>
>     >
>     > Canto GmbH
>     > Lietzenburger Str. 46
>     > 10789 Berlin
>     > Phone: +49 (0)30 390485-0
>     > Fax: +49 (0)30 390485-55
>     > Amtsgericht Berlin-Charlottenburg HRB 88566
>     > Geschäftsführer: Jack McGannon, Thomas Mockenhaupt
>     >
>
>

Mime
View raw message