lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikhail Khludnev <mkhlud...@griddynamics.com>
Subject Re: Count total frequency of a word in a SOLR index
Date Fri, 23 Jan 2015 08:23:13 GMT
https://cwiki.apache.org/confluence/display/solr/Function+Queries
totaltermfreq()

of you need to sum term freq on docs from resultset?


On Fri, Jan 23, 2015 at 10:56 AM, Nitin Solanki <nitinmlvya@gmail.com>
wrote:

> I indexed some text_file files in Solr as it is. Applied "
> *StandardTokenizerFactory*" and "*ShingleFilterFactory*" on text_file field
>
> *Configuration of Schema.xml structure below :*
> <field name="id" type="string" indexed="true" stored="true" required="true"
> multiValued="false" />
> <field name="text_file" type="textSpell" indexed="true" stored="true"
> required="true" multiValued="false"/>
>
>
>
>
>
>
>
>
>
>
> *<fieldType name="textSpell" class="solr.TextField"
> positionIncrementGap="100">       <analyzer
> type="index">                             <tokenizer
> class="solr.StandardTokenizerFactory"/>                             <filter
> class="solr.ShingleFilterFactory" maxShingleSize="5" minShingleSize="2"
> outputUnigrams="true"/>       </analyzer>       <analyzer
> type="query">                             <tokenizer
> class="solr.StandardTokenizerFactory"/>                             <filter
> class="solr.ShingleFilterFactory" maxShingleSize="5" minShingleSize="2"
> outputUnigrams="true"/>      </analyzer></fieldType>*
>
> *Stored Documents like:*
> *[{"id":"1", "text_file": "text": "text of document"}, {"id":"2",
> "text_file": "text": "text of document"} and so on ]*
>
> *Problem* : If I search a word in a SOLR index I get a document count for
> documents which contain this word, but if the word is included more times
> in a document, the total count is still 1 per document. I need every
> returned document is counted for the number of times they have the searched
> word in the field. *Example* :I see a "numFound" value of 12, but the word
> "what" is included 20 times in all 12 documents. Could you help me to find
> where I'm wrong, please?
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mkhludnev@griddynamics.com>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message