lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Chesky <Bill.Che...@learninga-z.com>
Subject RE: Analyzer on query question
Date Fri, 03 Aug 2012 16:53:22 GMT
Jack,

Thanks.  Yeah, I don't know what you mean be term analysis.  I googled it but didn't come
up with much.  So if that is the preferred way of doing this, a wiki document would be greatly
appreciated.  

I notice you did say I should be doing the term analysis first.  But is it wrong to do it
the way I described in my original email?  Will it give me incorrect results?

Bill


-----Original Message-----
From: Jack Krupansky [mailto:jack@basetechnology.com] 
Sent: Friday, August 03, 2012 9:33 AM
To: java-user@lucene.apache.org
Subject: Re: Analyzer on query question

Bill, the simple answer to your original question is that in general you 
should apply the same or similar analysis for your query terms as you do 
with your indexed data. In your specific case the Query.toString is 
generating your unanalyzed terms and then the query parser is performing the 
needed analysis. The real point is that you should be doing the tem analysis 
before invoking "new Term". Alas, term analysis has changed dramatically 
over the past couple of years, so the solution to doing analysis before 
generating a Term/TermQuery will vary from Lucene release to release.

We really do need a wiki page for Lucene term analysis.

-- Jack Krupansky

-----Original Message----- 
From: Bill Chesky
Sent: Friday, August 03, 2012 9:19 AM
To: simon.willnauer@gmail.com ; java-user@lucene.apache.org
Subject: RE: Analyzer on query question

Thanks Simon,

Unfortunately, I'm using Lucene 3.0.1 and CharTermAttribute doesn't seem to 
have been introduced until 3.1.0.  Similarly my version of Lucene does not 
have a BooleanQuery.addClause(BooleanClause) method.  Maybe you meant 
BooleanQuery.add(BooleanClause).

In any case, most of what you're doing there, I'm just not familiar with. 
Seems very low level.  I've never had to use TokenStreams to build a query 
before and I'm not really sure what is going on there.  Also, I don't know 
what PositionIncrementAttribute is or how it would be used to create a 
PhraseQuery.   The way I'm currently creating PhraseQuerys is very 
straightforward and intuitive.  E.g. to search for the term "foo bar" I'd 
build the query like this:

PhraseQuery phraseQuery = new PhraseQuery();
phraseQuery.add(new Term("title", "foo"));
phraseQuery.add(new Term("title", "bar"));

Is there really no easier way to associate the correct analyzer with these 
types of queries?

Bill

-----Original Message-----
From: Simon Willnauer [mailto:simon.willnauer@gmail.com]
Sent: Friday, August 03, 2012 3:43 AM
To: java-user@lucene.apache.org; Bill Chesky
Subject: Re: Analyzer on query question

On Thu, Aug 2, 2012 at 11:09 PM, Bill Chesky
<Bill.Chesky@learninga-z.com> wrote:
> Hi,
>
> I understand that generally speaking you should use the same analyzer on 
> querying as was used on indexing.  In my code I am using the 
> SnowballAnalyzer on index creation.  However, on the query side I am 
> building up a complex BooleanQuery from other BooleanQuerys and/or 
> PhraseQuerys on several fields.  None of these require specifying an 
> analyzer anywhere.  This is causing some odd results, I think, because a 
> different analyzer (or no analyzer?) is being used for the query.
>
> Question: how do I build my boolean and phrase queries using the 
> SnowballAnalyzer?
>
> One thing I did that seemed to kind of work was to build my complex query 
> normally then build a snowball-analyzed query using a QueryParser 
> instantiated with a SnowballAnalyzer.  To do this, I simply pass the 
> string value of the complex query to the QueryParser.parse() method to get 
> the new query.  Something like this:
>
>     // build a complex query from other BooleanQuerys and PhraseQuerys
>     BooleanQuery fullQuery = buildComplexQuery();
>     QueryParser parser = new QueryParser(Version.LUCENE_30, "title", new 
> SnowballAnalyzer(Version.LUCENE_30, "English"));
>     Query snowballAnalyzedQuery = parser.parse(fullQuery.toString());
>
>     TopScoreDocCollector collector = TopScoreDocCollector.create(10000, 
> true);
>     indexSearcher.search(snowballAnalyzedQuery, collector);

you can just use the analyzer directly like this:
Analyzer analyzer = new SnowballAnalyzer(Version.LUCENE_30, "English");

TokenStream stream = analyzer.tokenStream("title", new
StringReader(fullQuery.toString()):
CharTermAttribute termAttr = stream.addAttribute(CharTermAttribute.class);
stream.reset();
BooleanQuery q = new BooleanQuery();
while(stream.incrementToken()) {
  q.addClause(new BooleanClause(Occur.MUST, new Term("title",
termAttr.toString())));
}

you also have access to the token positions if you want to create
phrase queries etc. just add a PositionIncrementAttribute like this:
PositionIncrementAttribute posAttr =
stream.addAttribute(PositionsIncrementAttribute.class);

pls. doublecheck the code it's straight from the top of my head.

simon

>
> Like I said, this seems to kind of work but it doesn't feel right.  Does 
> this make sense?  Is there a better way?
>
> thanks in advance,
>
> Bill


----------------------------------------------
T ususcib, -mil jvausr-nsbsrie@ucneapch.ogFo adiioalcomads 
emal:jaa-se-hlpluen.aace.rg 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message