lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sudha Verma <>
Subject Keep URLs intact and not tokenized by the StandardTokenizer
Date Thu, 19 Nov 2009 05:58:11 GMT

I am using lucene 2-9-1.

I am reading in free text documents which I index using lucene and the
StandardAnalyzer at the moment.

The StandardAnalyzer keeps email addresses intact and does not tokenize
them. Is there something similar for
URLs? This seems like a common need. So, I thought I'd check if there
is anything out there that does it already.

I'd appreciate any help.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message