lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Hermann <herm...@informatik.uni-freiburg.de>
Subject Tokenizing XML
Date Fri, 15 Oct 2010 16:15:33 GMT
Hi,

is there a Tokenizer in Lucene, that tokenizes XML correctly?

I.e. that one gets from the following XML:
<span>this is <span attr="foo">example</span>text.</span>

Tokens (or similar):
<span> | this | is | <span attr="foo"> | example | </span> | text. | </span>

Or would i need to write such a Tokenizer myself?

regards
Christoph Hermann

-- 
Christoph Hermann
Institut für Informatik
Tel: +49 761-203-8171 Fax: +49 761-203-8162
e-mail: hermann@informatik.uni-freiburg.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message