lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Terry Steichen" <>
Subject Re: Enumerating Concatenated Fields
Date Sun, 17 Nov 2002 22:05:33 GMT

Thanks for your response, but I don't think I was particularly clear in my
original message.  Here's an expanded description.

For each Lucene Document in the index there will be a 'codes' field which
will contain a comma-delimited set of codes (this is the result of my
concatenation at index-time of the individual 'code' sections from each of
the corresponding XML documents).

In other words, assume the original XML document contains something like

When I index each such an XML document, I create a Lucene Document that has
a field called 'codes', which has the value: "value_of_code1,
value_of_code2, value_of_code3". (I do this so I can do boolean searches on
this field, so see which documents may have value_of_code1 AND
value_of_code2 AND NOT value_of_code3, for example.

Consider that each 'value_of_codexx' is a keyword.  Each XML document may
have zero or more such keywords (aka code sections).  I'm trying to figure
out a way to get a list of all the keywords used by the XML documents that
have been indexed.    It seems to me, the index itself (even though I do
store this concatenated result in it) won't really know how to parse the
string of comma-delimited code values that comprise each 'codes' field

Does that make more sense?



--- Original Message -----
From: "Otis Gospodnetic" <>
To: "Lucene Users List" <>
Sent: Sunday, November 17, 2002 4:24 PM
Subject: Re: Enumerating Concatenated Fields

> If I understand what you want - open an index with IndexReader, get the
> # of documents in it via IndexReader, loop through all documents,
> getting one with it's ID, and for each of them get field 'codes' out of
> it.
> Otis
> --- Terry Steichen <> wrote:
> > I have a collection of XML documents, each of which contains a
> > 'codes' section, each of which contains zero or more 'code' sections.
> >  When I index the documents, I concatenate all the non-empty 'code'
> > sections into a single 'codes' index field to facilitate boolean
> > searching.
> >
> > Given my structure, is there a way that I could get a list all the
> > defined 'code' values in the entire set of documents?  If not (as I
> > suspect), is there a way that I could change the indexing scheme to
> > add this functionality?
> >
> > Regards,
> >
> > Terry
> >
> >
> >
> >
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Web Hosting - Let the expert host your site
> --
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message