lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Terry Steichen" <te...@net-frame.com>
Subject Re: Enumerating Concatenated Fields
Date Sun, 17 Nov 2002 22:05:33 GMT
Otis,

Thanks for your response, but I don't think I was particularly clear in my
original message.  Here's an expanded description.

For each Lucene Document in the index there will be a 'codes' field which
will contain a comma-delimited set of codes (this is the result of my
concatenation at index-time of the individual 'code' sections from each of
the corresponding XML documents).

In other words, assume the original XML document contains something like
this:
.....
<codes>
    <code>value_of_code1</code>
    <code>value_of_code2</code>
    <code>value_of_code3</code>
</codes>
....

When I index each such an XML document, I create a Lucene Document that has
a field called 'codes', which has the value: "value_of_code1,
value_of_code2, value_of_code3". (I do this so I can do boolean searches on
this field, so see which documents may have value_of_code1 AND
value_of_code2 AND NOT value_of_code3, for example.

Consider that each 'value_of_codexx' is a keyword.  Each XML document may
have zero or more such keywords (aka code sections).  I'm trying to figure
out a way to get a list of all the keywords used by the XML documents that
have been indexed.    It seems to me, the index itself (even though I do
store this concatenated result in it) won't really know how to parse the
string of comma-delimited code values that comprise each 'codes' field
value.

Does that make more sense?

Regards,

Terry

--- Original Message -----
From: "Otis Gospodnetic" <otis_gospodnetic@yahoo.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Sunday, November 17, 2002 4:24 PM
Subject: Re: Enumerating Concatenated Fields


> If I understand what you want - open an index with IndexReader, get the
> # of documents in it via IndexReader, loop through all documents,
> getting one with it's ID, and for each of them get field 'codes' out of
> it.
>
> Otis
>
>
> --- Terry Steichen <terry@net-frame.com> wrote:
> > I have a collection of XML documents, each of which contains a
> > 'codes' section, each of which contains zero or more 'code' sections.
> >  When I index the documents, I concatenate all the non-empty 'code'
> > sections into a single 'codes' index field to facilitate boolean
> > searching.
> >
> > Given my structure, is there a way that I could get a list all the
> > defined 'code' values in the entire set of documents?  If not (as I
> > suspect), is there a way that I could change the indexing scheme to
> > add this functionality?
> >
> > Regards,
> >
> > Terry
> >
> >
> >
> >
>
>
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Web Hosting - Let the expert host your site
> http://webhosting.yahoo.com
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>
>


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message