lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Solr User <solr...@gmail.com>
Subject Re: Work-around for "indexed without position data"
Date Mon, 05 Jun 2017 18:18:25 GMT
Sorry for the delay.  I was able to reproduce this easily with my setup,
but reproducing this on a Solr example proved challenging.  Hopefully the
work that I did to find the situation in which this is produced will help
in resolving the problem.  The driving factor for this appears to be how
updates are sent to Solr.  When sending batches of updates with commits,
the problem is reproduced.  If the commit is held until after all updates
are sent, then no problem is produced.  This leads me to believe that this
issue has something to do with overlapping commits or index merges.  This
was reproducible regardless of running classic or managed schema and
regardless of running Solr core or SolrCloud.

There are not many steps to reproduce this, but you will need a way to send
these updates.  I have included inline create.sh and create.pl scripts to
generate the data and send the updates.  You can index a lastModified field
or something to convince yourself that everything has been re-indexed.  I
left that out to keep the steps lean.  Also, this test is using commit
statements from the client sending the updates for simplicity even though
it is not a good practice.  My normal setup is using Solrj with
commitWithin to allow Solr to manage when the commits take place, but the
same error is produced either way.


*STEPS TO REPRODUCE*

   1. Install Solr 5.5.3 and change to that working directory
   2. bin/solr -e techproducts
   3. bin/solr stop     [Why these next 3 steps?  These are to start the
   index completely new without the 32 example documents as opposed to a
   delete query.  The documents are not posted after the core is detected the
   second time.]
   4. rm -rf ./example/techproducts/solr/techproducts/data/
   5. bin/solr -e techproducts
   6. ./create.sh
   7. curl -X POST -H 'Content-type:application/json' --data-binary '{
   "replace-field":{ "name":"cat", "type":"text_en_splitting", "indexed":true,
   "multiValued":true, "stored":true } }'
   http://localhost:8983/solr/techproducts/schema
   8.
   http://localhost:8983/solr/techproducts/select?q=cat:%22hard%20drive%22
   [error]
   9. ./create.sh
   10.
   http://localhost:8983/solr/techproducts/select?q=cat:%22hard%20drive%22
   [error even though all documents have been re-indexed]

*create.sh*
#!/bin/bash
for i in {1..100}; do
echo "$i"
./create.pl $i > ./create.xml$i
curl http://localhost:8983/solr/techproducts/update?commit=true -H
"Content-Type: text/xml" --data-binary @./create.xml$i
done

*create.pl <http://create.pl>*
#!/usr/bin/perl
my $S = $ARGV[0];
my $I = 100;
my $N = $S*$I + $I;
my $i;
print "<add>\n";
for($i=$S*$I; $i<$N; $i++) {
   print "<doc><field name=\"id\">SP${i}</field><field name=\"cat\">cat
hard drive ${i}</field></doc>\n";
}
print "</add>\n";


On Fri, May 26, 2017 at 2:14 AM, Rick Leir <rleir@leirtech.com> wrote:

> Can you reproduce this error? What are the steps you take to reproduce it?
> ( simple is better).
>
> cheers -- Rick
>
>
>
> On 2017-05-25 05:46 PM, Solr User wrote:
>
>> This is in regards to changing a field type from string to
>> text_en_splitting, re-indexing all documents, even optimizing to give the
>> index a chance to merge segments and rewrite itself entirely, and then
>> getting this error when running a phrase query:
>> java.lang.IllegalStateException: field "blah" was indexed without
>> position
>> data; cannot run PhraseQuery
>>
>> I have encountered this issue before and have always done one of the
>> following as a work-around:
>> 1.  Instead of changing the field type on an existing field just create a
>> new field and retire the old one.
>> 2.  Delete the index directory and start from scratch.
>>
>> These work-arounds are not always ideal.  Does anyone know what is holding
>> onto that old field type definition?  What thinks it is still a string?
>> Every document has been re-indexed and I am sure of this because I have a
>> time stamp indexed.  Is there any other way to get this to work?
>>
>> For what it is worth, I am running this in SolrCloud mode but I remember
>> seeing this issue before SolrCloud was released as well.
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message