lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <>
Subject RE: Are there any tokenizers that ignore HTML tags but keep the offsets so they can be used for highlighting in the original document?
Date Tue, 08 Jun 2010 11:57:13 GMT
> Hi Ahmet,
> I am using Lucene.NET with C# so I can't test this quickly.
> Will HTMLStripCharFilter maintain the character offsets or does it just
> the plain text?

Yes the CharFilter does this!


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message