lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Langmead <>
Subject Document contents split among different Fields
Date Thu, 23 Sep 2004 21:11:16 GMT
I am working on extending Lucene to support documents with special islands
of an XML language, and I want to index the islands differently from the
text.  My current plan is to break the document's contents into two Fields,
one with all the text and one with all the special islands, and use a
different Analyzer on each Field.

In heading down this road, I realized that this approach breaks the whole
model of Token as it supports highlighting.  Token seems designed to store
offsets within a given Field, so if you break a document up into pieces, the
offsets are meaningless in terms of the original source document.

Am I right in saying that the design of Token's support for highlighting
really only supports having the entire document stored as one monolithic
"contents" Field?  Has anyone tackled indexing multiple content Fields
before that could shed some light?

Greg Langmead
Design Science, Inc., "How Science Communicates"

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message