lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Barbet Alain <alian123sol...@gmail.com>
Subject Re: Custom analyzer & frequency
Date Tue, 21 Nov 2017 15:21:14 GMT
Thank you very much for your answer.

It was an error on copy / paste on my mail sorry about that !
So it was already a text field, so omitTermFrequenciesAndPosition was
already on “false”

So I forget my custom analyzer and try to test with an already defined
field_type (text_fr) and see same behaviour in luke !
So I look better.
On Luke when I took term one by one on "Document" tab, I see my
frequency set to 2.
But in first panel of Luke "overview", in "show top terms" Freq is
still at 1 for all values.

I use Solr 6.5 & Luke 7.1. It didn't see this behavior if I open a
Lucene base I build outside Solr, I see top terms freq same on 2
panels.
Do you know a reason for that ?
Does this have an impact on Solr search ? Does bad freq in "top terms"
come from Luke or Solr ?


2017-11-21 12:08 GMT+01:00 Emir Arnautović <emir.arnautovic@sematext.com>:
> Hi Alain,
> You did not provided definition of used field type - you use “nametext” type and
pasted “text_ami” field type. It is possible that you have omitTermFrequenciesAndPosition=“true”
on nametext field type. The default value for text fields should be false.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
>> On 21 Nov 2017, at 11:43, Barbet Alain <alian123soleil@gmail.com> wrote:
>>
>> Hi,
>>
>> I build a custom analyzer & setup it in solr, but doesn't work as I expect.
>> I always get 1 as frequency for each word even if it's present
>> multiple time in the text.
>>
>> So I try with default analyzer & find same behavior:
>> My schema
>>
>>  <fieldType name="text_ami" class="solr.TextField">
>>    <analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
>>  </fieldType>
>>  <field name="docid" type="string" indexed="true" required="true"
>> stored="true"/>
>>  <field name="test_text" type="nametext"/>
>>
>> alian@yoda:~/solr> cat add_test.sh
>> DATA='
>> <add>
>>  <doc>
>>    <field name="docid">666</field>
>>    <field name="test_text">toto titi tata toto tutu titi</field>
>>  </doc>
>> </add>
>> '
>> curl -X POST -H 'Content-Type: text/xml'
>> 'http://localhost:8983/solr/alian_test/update?commit=true'
>> --data-binary "$DATA"
>>
>> When I test in solr interface / analyze, I find the right behavior
>> (find titi & toto 2 times).
>> But when I look in solr index with Luke or solr interface / schema,
>> the top term always get 1 as frequency. Can someone give me the thing
>> I forget ?
>>
>> (solr 6.5)
>>
>> Thank you !
>

Mime
View raw message