lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Sitsky <s...@nuix.com.au>
Subject Re: Query for the existence of a Lucene field in a document?
Date Tue, 25 May 2004 22:33:25 GMT
On Tue, 25 May 2004 17:29, Ype Kingma wrote:
> David,
>
> On Tuesday 25 May 2004 03:05, you wrote:
> > I have an application using Lucene 1.3 final.
> >
> > In this application, I am loading data where the main text for each
> > document is stored into a "body" field, a couple of other internal
> > fields, and basically some "meta-data fields" driven by the data being
> > loaded, which can created Lucene fields like M1, M2, M3, ...
> >
> > Not every document has every meta-data field present, for example, one
> > document may have M1, M5, M6, another might just have M1, M2, M3.  It
> > is also possible for the meta-data field value to be just the empty
> > string.
> >
> > The presence of a meta-data field has meaning to the application.  In
> > general, it is not known in advance what meta-data fields are present,
> > but it is generally a smallish number (< 100).
> >
> > There is a requirement for the user to be able to retrieve all
> > documents which have a particular meta-data field present.
> >
> > I can't see anyway of doing this with the query parser.  Is there a
> > way of doing this?  ie, retrieve all documents which have a specific
> > field set.
> >
> > I seems to me I need to create a new tokenized unstored field called
> > something like "meta-data-fields" for each document, which contains
> > what meta-data field names are present for that document.  In the
> > above example, one document could have the value "M1 M5 M6", the other
> > "M1 M2 M3".
> >
> > Does this seem reasonable?  Is there any way of doing this without
> > introducing a new field?
>
> Normally an empty string does not result in indexed terms, so there is
> no way to find out about the existence of an indexed field in that case,
> except by using more information.
> Using an extra field that mentions the field names of all other fields
> will solve your problem. In practice, such a field is also quite handy
> for statistics/debugging, so I'd tokenize, index and store it. Once it's
> indexed, it can be queried as any other field.

Thanks - that's what I have ended up doing.  Cheers.

-- 
Cheers,
David

This message is intended only for the named recipient.  If you are not the 
intended recipient you are notified that disclosing, copying, distributing 
or taking any action  in reliance on the contents of this information is 
strictly prohibited.


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message