lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mile Rosu" <mile.r...@level7.ro>
Subject RE: Removing brackets before indexing
Date Thu, 01 Jun 2006 13:18:12 GMT
Hello Otis,

Thank you for the hint. I have made a custom analyzer which uses a
custom tokenizer similar to CharTokenizer - it treats brackets as token
characters, but removes them in the next() method. This is because I do
not want to split the word when adding it to the index. It seems to work
ok, still needs more testing. By just using SimpleAnalyzer words were
split. 

Mile

  

-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com] 
Sent: Wednesday, May 31, 2006 7:36 PM
To: java-user@lucene.apache.org
Subject: Re: Removing brackets before indexing

Mile,

Any Analyzer that uses a Tokenizer that throws out non-characters will
do.
For example, take a look at SimpleAnalyzer.  It uses LowerCaseTokenizer.
If you read the javadoc for LowerCaseTokenizer, I think you will see it
suits you.

Otis

----- Original Message ----
From: Mile Rosu <mile.rosu@level7.ro>
To: java-user@lucene.apache.org
Sent: Wednesday, May 31, 2006 11:47:12 AM
Subject: Removing brackets before indexing

Hello!

I am currently trying to index latin language documents, in which
missing letters are appended to words by using square brackets, like
this : "[divinit]atis". 

Could you tell me please which would be the best practice to remove the
brackets before adding into the Lucene index? (in the example to store
the word "divinitatis").

Thank you a lot,
Mile Rosu

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message