lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pouliot, Scott" <Scott.Poul...@peoplefluent.com>
Subject RE: Getting an error: <field> was indexed without position data; cannot run PhraseQuery
Date Tue, 07 Mar 2017 01:26:50 GMT
Hmm.  We haven’t changed data or the definition in YEARS now.  I'll have to do some more
digging I guess.  Not sure re-indexing is a great thing to do though since this is a production
setup and the database for this user is @ 50GB.  It would take quite a long time to reindex
all that data from scratch.  Hmmmm

Thanks for the quick reply Erick!

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Monday, March 6, 2017 5:33 PM
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: Getting an error: <field> was indexed without position data; cannot run
PhraseQuery

Usually an _s field is a "string" type, so be sure you didn't change the definition without
completely re-indexing. In fact I generally either index to a new collection or remove the
data directory entirely.

right, the field isn't indexed with position information. That combined with (probably) the
WordDelimiterFilterFactory in text_en_splitting is generating multiple tokens for inputs like
3799H.
See the admin/analysis page for how that gets broken up. Term positions are usually enable
by default, so I'm not quite sure why they're gone unless you disabled them.

But you're on the right track regardless. you have to
1> include term positions for anything that generates phrase queries
or
2> make sure you don't generate phrase queries. edismax can do this if
you have it configured to, and then there's autoGeneratePhrasQueries that you may find.

And do reindex completely from scratch if you change the definitions.

Best,
Erick

On Mon, Mar 6, 2017 at 1:41 PM, Pouliot, Scott <Scott.Pouliot@peoplefluent.com> wrote:
> We keep getting this in our Tomcat/SOLR Logs and I was wondering if a simple schema change
will alleviate this issue:
>
> INFO  - 2017-03-06 07:26:58.751; org.apache.solr.core.SolrCore; 
> [Client_AdvanceAutoParts] webapp=/solr path=/select params={fl=candprofileid,+candid&start=0&q=*:*&wt=json&fq=issearchable:1+AND+cpentitymodifiedon:[2017-01-20T00:00:00.000Z+TO+*]+AND+clientreqid:17672+AND+folderid:132+AND+(engagedid_s:(0)+AND+atleast21_s:(1))+AND+(preferredlocations_s:(3799H))&rows=1000}
status=500 QTime=1480 ERROR - 2017-03-06 07:26:58.766; org.apache.solr.common.SolrException;
null:java.lang.IllegalStateException: field "preferredlocations_s" was indexed without position
data; cannot run PhraseQuery (term=3799)
>                 at org.apache.lucene.search.PhraseQuery$PhraseWeight.scorer(PhraseQuery.java:277)
>                 at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:351)
>                 at org.apache.lucene.search.Weight.bulkScorer(Weight.java:131)
>                 at org.apache.lucene.search.BooleanQuery$BooleanWeight.bulkScorer(BooleanQuery.java:313)
>                 at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618)
>                 at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)
>                 at org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:1158)
>                 at org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet(SolrIndexSearcher.java:846)
>                 at org.apache.solr.search.SolrIndexSearcher.getProcessedFilter(SolrIndexSearcher.java:1004)
>                 at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1517)
>                 at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1397)
>                 at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:478)
>                 at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:461)
>                 at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
>                 at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>                 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952)
>                 at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774)
>                 at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
>                 at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
>                 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
>                 at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
>                 at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
>                 at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
>                 at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
>                 at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
>                 at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
>                 at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
>                 at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)
>                 at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
>                 at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
>                 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>                 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>                 at java.lang.Thread.run(Unknown Source)
>
>
> The field in question "preferredlocations_s" is not defined in schema.xml explicitly,
but we have a dynamicField schema entry that covers it.
>
> <dynamicField name="*_s" type="text_en_splitting" indexed="true" 
> stored="true" />
>
> Would adding omitTermFreqAndPositions="false" to this schema line help out here?  Should
I explicitly define this "preferredlocations_s" field in the schema instead and add it there?
 We do have a handful of dynamic fields that all get covered by this rule, but it seems the
"preferredlocations_s" field is the only one throwing errors.  All it stores is a CSV string
with location IDs in it.
>
Mime
View raw message