lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raymond Balmès <raymond.bal...@gmail.com>
Subject Re: N-grams with numbers and Shinglefilters
Date Mon, 02 Mar 2009 17:26:18 GMT
Yes, I don't need a ShingleFilter I understand it by now.

Yes I will have many of these phrases in the documents... this is why I
thought I shouldn't use Lucene fields.

I will investigate further your keyword approach sounds like possible, thx
for the tip.
However I presume I may need to normalize the phrases for the search phase,
so it may not work.

Keep in touch,

-RB-




On Mon, Mar 2, 2009 at 5:23 PM, Steven A Rowe <sarowe@syr.edu> wrote:

> Hi Raymond,
>
> On 3/2/2009 at 10:09 AM, Raymond Balmès wrote:
> > suppose I have a tri-gram, what I want to do is index the tri-gram
> > "string digit1 digit2" as one indexing phrase, and not index each token
> > separately.
>
> As long as you don't want any transformation performed on the phrase or its
> components, you can add your phrase as a "keyword", i.e. a non-analyzed
> string that will be indexed as-is.
>
> Unless your phrase field will be the only field on this document (pretty
> unlikely), you'll want to use PerFieldAnalyzerWrapper[1] over
> KeywordAnalyzer[2] for the phrase field, and whatever other analyzer you
> like for the other document field(s).
>
> AFAICT, you don't need ShingleFilter.
>
> Steve
>
> [1] PerFieldAnalyzerWrapper:
> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysis/PerFieldAnalyzerWrapper.html
> [2] KeywordAnalyzer:
> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysis/KeywordAnalyzer.html
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message