lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jian chen <>
Subject Re: Adding generic payloads to a Term's posting list
Date Mon, 10 Oct 2005 20:36:22 GMT

I have been studying the Lucene indexing code for a bit. I am not sure if I
understand the problem scope completely, but, storing extra information
using TermsInfoWriter may not solve the problem?

For the example of XML document tag depth, could that be a seperate field?
Because Lucene term is a combination of (field, termText), so, depth could
be a field and even though two XML tags are the same, if their depths are
different, they are still treated as separate terms.

This is what I could think about so far.


On 10/10/05, Grant Ingersoll <> wrote:
> See item #11 of API changes. Maybe along the lines of what you are
> interested in, although I don't know if anyone has even attempted a design
> of it. I would also like to see this, plus the ability to store info at
> higher levels in the Index, such as Field (not on a per token basis),
> Document (info about the document that spans it's fields) and Index (such
> as
> coreference information). Alas, no time...
> -Grant
> >-----Original Message-----
> >From: Shane O'Sullivan []
> >Sent: Monday, October 10, 2005 8:38 AM
> >To:
> >Subject: Adding generic payloads to a Term's posting list
> >
> >Hi,
> >
> >To the best of my knowledge, it is not possible to add generic
> >data to a Term's posting list.
> >By this I mean info that is defined by the search engine, not
> >Lucene itself.
> >Whereas Lucene adds some data to the posting lists, such as
> >the term's position within a document, there are many other
> >useful types of information that could be attached to a term.
> >
> >Some examples would be in XML documents, to store the depth of
> >a tag in the document, or font information, such as if the
> >term appeared in a header or in the main body of text.
> >
> >Are there any plans to add such functionality to the API? If
> >not, where would be a the appropriate place to implement these
> >changes? I presume the TermInfosWriter and TermInfosReader
> >would have to be altered, as well as the classes which call
> >them. Could this be done without having to modify the index in
> >such a way that standard Lucene indexes couldn't read it?
> >
> >Thanks
> >
> >Shane
> >
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message