lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Are there any tokenizers that ignore HTML tags but keep the offsets so they can be used for highlighting in the original document?
Date Tue, 08 Jun 2010 11:57:13 GMT
> Hi Ahmet,
> 
> I am using Lucene.NET with C# so I can't test this quickly.
> Will HTMLStripCharFilter maintain the character offsets or does it just
extract
> the plain text?

Yes the CharFilter does this!

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message