lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Fäßler <erik.faess...@uni-jena.de>
Subject Re: Solr 1.4.1: Weird query results
Date Wed, 20 Apr 2011 08:17:11 GMT
  Thank you very much for your answers :-) First of all, I just noticed 
I sent the question unintentionally to the Lucene list while it's more 
of a Solr issue. I will answer here all the same to not confuse things. 
My apologies ;)

First to Erick's suggestions. The default field has been "text" for a 
longer time so I did not make a change to that field yesterday but it 
had been this field before already.
With "not created by Solr" I exactly mean it has been created using 
Lucene directly. This could be an issue indeed as Lucene 2.3.1 has been 
used to create the index, where Solr 1.4.1 uses Lucene 2.9.3. But it 
seemed to work fine so far (but perhaps that's just wrong, I don't know 
yet).

I tried your hint with appending "&debugQuery=on". Guess what: With that 
appended I get my hits. No kidding, appending the debug option gives my 
30 document hits, deleting it from my browser's address bar leaves me 
behind with 0 hits (?!).
Adress bar strings are:
No hits:
http://localhost:8983/solr/select/?q=marine&version=2.2&start=0&rows=10&indent=on
30 hits:
http://localhost:8983/solr/select/?q=marine&version=2.2&start=0&rows=10&indent=on&debugQuery=on

Here's the debug output concerning the query:

<str name="rawquerystring">marine</str>
<str name="querystring">marine</str>
<str name="parsedquery">text:marine</str>
<str name="parsedquery_toString">text:marine</str>

Seems fine. This is expected because I already tried the analysis 
interface to check whether the correct terms are searched for.
Here my schema snippets:

FieldType "text_ws":
<fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>
(Solr 1.4.1. default)

Field "text":
<field name="text" type="text_ws" indexed="true" stored="true" 
termVectors="true" termPositions="true" />

Default search field:
<defaultSearchField>text</defaultSearchField>

I guess this also answers the hints given by Lance. Writing this down, I 
get the feeling the first thing I should do is to update my index to 
match the Lucene version used by Solr. This seems to be the most obvious 
hint (but as Luke can handle all version I thought using this index with 
Solr should be fine, too). Although it's really quite strange that 
appending the debug option changes my search results. Oh my, probably I 
did just miss some basic about how to usr Solr ;)
Your opinion? Changing the index to another Lucene version isn't exactly 
the fastest and easiest thing so I'd like to strike out all other 
possibilities before :)

Best regards,

     Erik


Am 20.04.2011 01:07, schrieb Lance Norskog:
> Look at the "text" definition stack. Does it have the same analyzer
> and filter that you used to make the index, and in the same order?
>
> The specific problem is that the "text" field includes a stemmer, and
> your code probably did not. And so "marine" is stored as, maybe
> 'marin'.  To check this out, look at the 'schema browser' page off the
> admin page. This will show you all of the indexed terms in each field.
> Also look at the Analysis page: this lets you see how text is parsed
> and changed in the analysis stack.
>
> On Tue, Apr 19, 2011 at 2:56 PM, Erick Erickson<erickerickson@gmail.com>  wrote:
>> Hmmmm, I don't see the problem either. It *sounds* like you don't really
>> have the default search field defined the way you think you do. Did you restart
>> Solr after making that change?
>>
>> I'm assuming that when you say "not created by Solr" you mean that it's created
>> by Lucene. What version of Lucene and Solr are you using if that's true?
>>
>> You can test this by appending "&debugQuery=on" to your query or checking
>> the "debug enable" checkbox in the full query interface from the admin page.
>> That should show you exactly what is being searched. You might also want
>> to look at the analysis page for your field and see how your query
>> is tokenized.
>>
>> But, like I said, this looks like it should work. If you can post the results of
>> adding&debugQuery=on and your actual<fieldType>  definition for "text_ws"
your
>> <field>  declaration for "text" and the<defaultSearchField>    from your
schema
>> that would help. I can't tell you how many times something that's eluded me
>> for hours is obvious to someone else :)..
>>
>> Best
>> Erick
>>
>>
>>
>> On Tue, Apr 19, 2011 at 3:59 PM, Erik Fäßler<erik.faessler@uni-jena.de> 
wrote:
>>>   Hallo there,
>>>
>>> my issue qualifies as newbie question I guess, but I'm really a bit
>>> confused. I have an index which has not been created by Solr. Perhaps that's
>>> already the point although I fail to see why this should be an issue with my
>>> problem.
>>>
>>> I use the admin interface to check which results particular queries bring
>>> in. My index documents have a field "text" which holds the document text.
>>> This text has only been white space tokenized. So in my schema, the type for
>>> this field is "text_ws". My schema says
>>> "<defaultSearchField>text</defaultSearchField>".
>>>
>>> When I now search for, say, 'marine' (without quotes), I don't get any
>>> search results. But when I search '"marine"' (that is, embraced by double
>>> quotes) I get my document hits. Alternatively, I can prepend the field name:
>>> 'text:marine' and will also get my results.
>>>
>>> Similar with this phrase query: "marine mussels", where "In marine mussels
>>> of the genus" is a text snippet of a document. The phrase "marine mussels"
>>> won't give any hits. Searching for 'text:"marine mussels"' will give me the
>>> exact document containing this text snippet.
>>>
>>> I'm sure this has quite a simple explanation but I'm unable to find it right
>>> now ;-) Perhaps you can help with that.
>>>
>>> Thanks a lot!
>>>
>>> Best regards,
>>>
>>>     Erik
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message