lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven A Rowe <sar...@syr.edu>
Subject RE: N-grams with numbers and Shinglefilters
Date Mon, 02 Mar 2009 13:37:07 GMT
Hi Raymond,

On 3/1/2009, Raymond Balm├Ęs wrote:
> I'm trying to index (& search later) documents that contain tri-grams
> however they have the following form:
> 
> <string> <2 digit> <2 digit>
> 
> Does the ShingleFilter work with numbers in the match ?

Yes, though it is the tokenizer and previous filters in the chain that will be the (potential)
source of difficulties, not ShingleFilter.

> Another complication, in future features I'd like to add optional
> digits like
> 
> [<1 digit>] <string> <2 digit> <2 digit>
> 
> I suppose the ShingleFilter won't do it ?

ShingleFilter just pastes together the tokens produced by the previous component in the analysis
chain, in a sliding window.  As currently written, it doesn't provide the sort of functionality
you seem to be asking for.

> Any better advice ?

What do your documents look like?  What do you hope to accomplish using ShingleFilter?  It's
tough to give advice without knowing what you want to do.

Steve


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message