lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mck <m...@semb.wever.org>
Subject RE: Re: Replacing FAST functionality atsesam.no-ShingleFilter+exactmatching
Date Thu, 11 Sep 2008 07:51:33 GMT
On Wed, 2008-09-10 at 14:47 -0700, Chris Hostetter wrote:
> The FieldQParserPlugin in particular passes the entire querystring to
> the 
> Analyzer for the field specified by an "f" param as a single chunk...
> 
>          {!field f=yourfieldName}Some input that can have spaces
> 
> http://localhost:8983/solr/select/?debugQuery=true&rows=0&q=%7B%
> 21field+f%3Dname%7DFoo+Bar

But at the end of the day will
{!field f:list_entry_shingle}abcd efgh ijkl
still end up as
list_entry_shingle:"abcd efgh ijkl"
?

I was unable to get a url like
http://localhost:8080/solr/select/?debugQuery=true&rows=0&q={!field%20f:list_entry_shingle}abcd%20efgh%20ijkl
to work. I got 
> org.apache.lucene.queryParser.ParseException: Expected identifier at
> pos 9 str='{!field f:list_entry_shingle}abcd efgh ijkl'

I ask because the javadoc indicates this and from what i can see in
FieldQParserPlugin you still end up with one of the same three return
types:
BooleanQuery, MultiPhraseQuery, or PhraseQuery.

So as long as we have shingles with positionIncrement=0 and unigrams (or
the first shingles) with positionIncrement=1 you'll end up with a
MultiPhraseQuery.
If we treat the phrase instead as a single term (eg by using the
KeywordTokenizer) then no shingles will be generated (as one term is
just a unigram).

Does this make sense?

~mck

-- 
"Living on Earth is expensive, but it does include a free trip around
the sun every year." Unknown 
| semb.wever.org | sesat.no | sesam.no |

Mime
View raw message