lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Fields with Field.Store.NO and Field.Index.ANALYZED not being indexed
Date Sun, 29 Aug 2010 21:41:49 GMT
Adding to Uwe's comment, you may be operating under a false
assumption. Lucene has no capability to update fields in a document.
Period. This is one of the most frequently requested changes, but
the nature of an inverted index makes this...er...tricky. Updates
are really a document delete followed by a document add. And as
a bonus, the new document won't even have the same internal
Lucene doc id as the one it replaces.

So if you're reading a document from the index, non-stored fields
are not part of the new update and your results will be...uhmmmm....
not what you expect...

Best
Erick

On Sun, Aug 29, 2010 at 1:48 PM, Uwe Schindler <uwe@thetaphi.de> wrote:

> You cannot retrieve non-stored fields. They are analyzed and tokenized
> during indexing and this is a one-way transformation. If you update
> documents you have to reindex the contents. If you do not have access to
> the
> original contents anymore, you may consider adding a stored-only "raw
> document" field, that contains everything to rebuild the indexed fields. In
> our installation, we have a stored field containing the JSON/XML source
> document to do this.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
> > -----Original Message-----
> > From: Constantine Vetoshev [mailto:gepardcv@gmail.com]
> > Sent: Sunday, August 29, 2010 10:38 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Fields with Field.Store.NO and Field.Index.ANALYZED not
> being
> > indexed
> >
> > Thanks Erick.
> >
> > I finally had time to go back and look at this problem. I discovered that
> the
> > analyzed fields work fine for searching until I use
> > IndexWriter.updateDocument().
> >
> > The way my application runs, it has to update documents several times to
> > update one specific field. The update code queries out Document objects
> using
> > a unique identifier, and updates the field. The problem is in Document
> objects
> > returned by the query. The querying code runs a search, and eventually
> calls
> > IndexSearcher.doc(int). According to the API documentation, that method
> only
> > returns Document objects with stored fields from the underlying index.
> >
> > I tried calling IndexSearcher.doc(int i, FieldSelector fieldSelector)
> with
> > fieldSelector set to null: the documentation states that this returns
> Document
> > objects with all fields, but that also only seems to return stored
> fields.
> >
> > So my question becomes: how can I update a document which contains non-
> > stored analyzed fields without clobbering the analyzed-only fields?
> > Note that I do not need to update the analyzed-only fields. I have found
> nothing
> > helpful in the documentation.
> >
> > --
> > Regards,
> > Constantine Vetoshev
> >
> >
> > Erick Erickson <erickerickson@gmail.com> writes:
> >
> > > I would be extraordinarily surprised if this was in Lucene, this is so
> > > basic to how it works that the howls would be heard world-round <G>.
> > >
> > > So I'm guessing it's in your code. Could you show it to us? Or, better
> > > yet, create a small, self-contained test case that illustrates your
> problem?
> > >
> > > Also, what analyzer(s) are you using? And what do your docs look like?
> > >
> > > Best
> > > Erick
> > >
> > > On Thu, Mar 25, 2010 at 3:46 PM, Constantine Vetoshev
> > <gepardcv@gmail.com>wrote:
> > >
> > >> I have a strange problem with Field.Store.NO and Field.Index.ANALYZED
> > >> fields with Lucene 3.0.1.
> > >>
> > >> I'm testing my app with twenty test documents. Each has about ten
> > >> fields. All fields except one, "Content", are set as Field.Store.YES.
> > >> The "Content" field is set as Field.Store.NO and
> > >> Field.Index.ANALYZED. Using Luke, I discovered that this "Content"
> > >> field is not persisted to the disk, except on one document (neither
> > >> the first nor the last in the list). This always happens for exactly
> > >> the same document. When I examine the Document object before writing
> > >> it, it has the "Content" field I expect.
> > >>
> > >> When I change the "Content" field from Field.Store.NO to
> > >> Field.Store.YES, everything starts working. Every document has the
> > >> "Content" field exactly as I expect, and searches produce the hits I
> > >> expect to see. I really don't want to save the full "Content" data in
> > >> the Lucene index, though. I'm baffled why Field.Store.NO results in
> > >> nothing being written to the index even with Field.Index.ANALYZED.
> > >>
> > >> Suggestions?
> > >>
> > >> --
> > >> Regards,
> > >> Constantine Vetoshev
> > >>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >> For additional commands, e-mail: java-user-help@lucene.apache.org
> > >>
> > >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message