lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sudha Verma <verma.su...@gmail.com>
Subject Keep URLs intact and not tokenized by the StandardTokenizer
Date Thu, 19 Nov 2009 05:58:11 GMT
Hi,

I am using lucene 2-9-1.

I am reading in free text documents which I index using lucene and the
StandardAnalyzer at the moment.

The StandardAnalyzer keeps email addresses intact and does not tokenize
them. Is there something similar for
URLs? This seems like a common need. So, I thought I'd check if there
is anything out there that does it already.

I'd appreciate any help.

Thanks,
sudha

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message