lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: Analyzer on query question
Date Fri, 03 Aug 2012 17:12:08 GMT
Bill


You're getting the snowball stemming either way which I guess is good,
and if you get same results either way maybe it doesn't matter which
technique you use.  I'd be a bit worried about parsing the result of
query.toString() because you aren't guaranteed to get back, in text,
what you put in.

My way seems better to me, but then it would.  If you prefer your way
I won't argue with you.


--
Ian.


On Fri, Aug 3, 2012 at 5:57 PM, Bill Chesky <Bill.Chesky@learninga-z.com> wrote:
> Ian,
>
> I gave this method a try, at least the way I understood your suggestion. E.g. to search
for the phrase "cells combine" I built up a string like:
>
> title:"cells combine" description:"cells combine" text:"cells combine"
>
> then I passed that to the queryParser.parse() method (where queryParser is an instance
of QueryParser constructed using SnowballAnalyzer) and added the result as a MUST clause in
my final BooleanQuery.
>
> When I print the resulting query out as a string I get:
>
> +(title:"cell combin" description:"cell combin" keywords:"cell combin")
>
> So it looks like the SnowballAnalyzer is doing some stemming for me.  But this is the
exact same result I'd get doing it the way I described in my original email.  I just built
the unanalyzed string on my own rather than using the various query classes like PhraseQuery,
etc.
>
> So I don't see the advantage to doing it this way over the original method.  I just don't
know if the original way I described is wrong or will give me bad results.
>
> thanks for the help,
>
> Bill
>
> -----Original Message-----
> From: Ian Lea [mailto:ian.lea@gmail.com]
> Sent: Friday, August 03, 2012 9:32 AM
> To: java-user@lucene.apache.org
> Subject: Re: Analyzer on query question
>
> You can add parsed queries to a BooleanQuery.  Would that help in this case?
>
> SnowballAnalyzer sba = whatever();
> QueryParser qp = new QueryParser(..., sba);
> Query q1 = qp.parse("some snowball string");
> Query q2 = qp.parse("some other snowball string");
>
> BooleanQuery bq = new BooleanQuery();
> bq.add(q1, ...);
> bq.add(q2, ...);
> bq.add(loads of other stuff);
>
>
> --
> ian.
>
>
> On Fri, Aug 3, 2012 at 2:19 PM, Bill Chesky <Bill.Chesky@learninga-z.com> wrote:
>> Thanks Simon,
>>
>> Unfortunately, I'm using Lucene 3.0.1 and CharTermAttribute doesn't seem to have
been introduced until 3.1.0.  Similarly my version of Lucene does not have a BooleanQuery.addClause(BooleanClause)
method.  Maybe you meant BooleanQuery.add(BooleanClause).
>
>>
>> In any case, most of what you're doing there, I'm just not familiar with.  Seems
very low level.  I've never had to use TokenStreams to build a query before and I'm not really
sure what is going on there.  Also, I don't know what PositionIncrementAttribute is or how
it would be used to create a PhraseQuery.   The way I'm currently creating PhraseQuerys is
very straightforward and intuitive.  E.g. to search for the term "foo bar" I'd build the query
like this:
>>
>>                                                 PhraseQuery phraseQuery = new PhraseQuery();
>>                                                 phraseQuery.add(new Term("title",
"foo"));
>>                                                 phraseQuery.add(new Term("title",
"bar"));
>>
>> Is there really no easier way to associate the correct analyzer with these types
of queries?
>>
>> Bill
>>
>> -----Original Message-----
>> From: Simon Willnauer [mailto:simon.willnauer@gmail.com]
>> Sent: Friday, August 03, 2012 3:43 AM
>> To: java-user@lucene.apache.org; Bill Chesky
>> Subject: Re: Analyzer on query question
>>
>> On Thu, Aug 2, 2012 at 11:09 PM, Bill Chesky
>> <Bill.Chesky@learninga-z.com> wrote:
>>> Hi,
>>>
>>> I understand that generally speaking you should use the same analyzer on querying
as was used on indexing.  In my code I am using the SnowballAnalyzer on index creation.  However,
on the query side I am building up a complex BooleanQuery from other BooleanQuerys and/or
PhraseQuerys on several fields.  None of these require specifying an analyzer anywhere.  This
is causing some odd results, I think, because a different analyzer (or no analyzer?) is being
used for the query.
>>>
>>> Question: how do I build my boolean and phrase queries using the SnowballAnalyzer?
>>>
>>> One thing I did that seemed to kind of work was to build my complex query normally
then build a snowball-analyzed query using a QueryParser instantiated with a SnowballAnalyzer.
 To do this, I simply pass the string value of the complex query to the QueryParser.parse()
method to get the new query.  Something like this:
>>>
>>>     // build a complex query from other BooleanQuerys and PhraseQuerys
>>>     BooleanQuery fullQuery = buildComplexQuery();
>>>     QueryParser parser = new QueryParser(Version.LUCENE_30, "title", new SnowballAnalyzer(Version.LUCENE_30,
"English"));
>>>     Query snowballAnalyzedQuery = parser.parse(fullQuery.toString());
>>>
>>>     TopScoreDocCollector collector = TopScoreDocCollector.create(10000, true);
>>>     indexSearcher.search(snowballAnalyzedQuery, collector);
>>
>> you can just use the analyzer directly like this:
>> Analyzer analyzer = new SnowballAnalyzer(Version.LUCENE_30, "English");
>>
>> TokenStream stream = analyzer.tokenStream("title", new
>> StringReader(fullQuery.toString()):
>> CharTermAttribute termAttr = stream.addAttribute(CharTermAttribute.class);
>> stream.reset();
>> BooleanQuery q = new BooleanQuery();
>> while(stream.incrementToken()) {
>>   q.addClause(new BooleanClause(Occur.MUST, new Term("title",
>> termAttr.toString())));
>> }
>>
>> you also have access to the token positions if you want to create
>> phrase queries etc. just add a PositionIncrementAttribute like this:
>> PositionIncrementAttribute posAttr =
>> stream.addAttribute(PositionsIncrementAttribute.class);
>>
>> pls. doublecheck the code it's straight from the top of my head.
>>
>> simon
>>
>>>
>>> Like I said, this seems to kind of work but it doesn't feel right.  Does this
make sense?  Is there a better way?
>>>
>>> thanks in advance,
>>>
>>> Bill
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message