lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doron Cohen <cdor...@gmail.com>
Subject Re: IndexWriter update method
Date Tue, 21 Apr 2009 09:09:01 GMT
*IndexWriter.deleteDocuments<http://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/index/IndexWriter.html#deleteDocuments%28org.apache.lucene.search.Query%29>
*(Query<http://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/search/Query.html>
 query) may be handy too
(but note that it will delete *all* docs that match the query).
Doron

On Tue, Apr 21, 2009 at 2:28 AM, Erick Erickson <erickerickson@gmail.com>wrote:

> I don't think you *can* create a Term that spans two fields. Perhaps
> you'd be better off just doing a search, getting the doc ID back then
> adding a new version of the document.
>
> You *could* think about reindexing your corpus and indexing an
> additional field that was the concatenation of the two fields you
> want to update by if that's a better solution.
>
> Best
> Erick
>
> On Mon, Apr 20, 2009 at 11:40 AM, Newman, Billy <Billy.Newman@itt.com
> >wrote:
>
> > What if you're unique id is a composite of two field when you create the
> > document?
> >
> > I.E.
> > doc.add(new Field("partno", "123345",
> > Field.Store.whatever, Field.Index.UN_TOKENIZED);
> > doc.add(new Field("storeLoc", "Springfield",
> > Field.Store.whatever, Field.Index.UN_TOKENIZED);
> >
> > How do you create a Term for this?  Is this possible?  Is this the
> correct
> > way to create a document that has two fields?  If so I am a little lost
> on
> > how to create a term to correctly find this.
> >
> > Thanks,
> > Billy
> >
> > -----Original Message-----
> > From: Erick Erickson [mailto:erickerickson@gmail.com]
> > Sent: Friday, April 17, 2009 8:08 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: IndexWriter update method
> >
> > What you're missing is that the example has no unique ID, it wasn't
> created
> > with update in mind.
> >
> > There's no hidden magic for Lucene knowing *what* document you want
> > to have updated, you have to provide it yourself, and it should be
> unique.
> >
> > Imagine a parts catalog, or an index of a directory tree. In the parts
> > catalog,
> > you could identify the document by its part number, so you'd probably
> index
> > it something like doc.add(new Field("partno", "123345",
> > Field.Store.whatever, Field.Index.UN_TOKENIZED);
> > Indexing a directory tree you could use the complete file path similarly.
> >
> > Now, each document will have one (and only one) partno, and it'll be
> unique
> > (you really
> > don't want to tokenize this).
> >
> > To update, you'd form your term on the field "partno" and value "123345",
> > thus uniquely
> > identifying the document you want replaced, and use that term in your
> > update
> > statement.
> > Think of the Term as a unique key for the document that *you've*
> > deliberately put there.
> >
> > I'm pretty sure (but not positive) that if you update a document where
> the
> > term doesn't
> > have any matches, you'll get a simple insert, but I won't guarantee it.
> >
> > HTH
> > Erick
> >
> >
> > On Fri, Apr 17, 2009 at 9:28 PM, Newman, Billy <Billy.Newman@itt.com>
> > wrote:
> >
> > > Ok I am still confused.
> > >
> > > Looking at the examples to index a document I would do something like
> the
> > > following:
> > >        Document document = new Document();
> > >        document.add(Field.UnStored("article", article));
> > >        document.add(Field.Text("comments", comments));
> > >        Analyzer analyzer  = new StandardAnalyzer();
> > >        IndexWriter writer = new IndexWriter(indexDirectory, analyzer,
> > > false);
> > >        writer.addDocument(document);
> > >        writer.optimize();
> > >        writer.close();
> > >
> > > Now lets say that the comments can change and when they do I want to
> > update
> > > that document to contain the newly updated comments.
> > >
> > > So I would have to go back and check my index to see if that book
> already
> > > exists.
> > > Query q = new QueryParser("article", analyzer).parse(querystr);
> > > int hitsPerPage = 10;
> > > IndexSearcher searcher = new IndexSearcher(index);
> > > TopDocCollector collector = new TopDocCollector(hitsPerPage);
> > > searcher.search(q, collector);
> > > ScoreDoc[] hits = collector.topDocs().scoreDocs;
> > > if (hits!= null && hits.length > 0) {
> > >  // ?
> > >  // Then this already exists and I just want to update the comments
> > section
> > > }
> > >
> > > Does that make sense?  Am I going about this wrong?
> > >
> > > Billy
> > >
> > >
> > > ________________________________________
> > > From: Tim Williams [williamstw@gmail.com]
> > > Sent: Friday, April 17, 2009 6:05 PM
> > > To: java-user@lucene.apache.org
> > > Subject: Re: IndexWriter update method
> > >
> > > On Fri, Apr 17, 2009 at 7:27 PM, Newman, Billy <Billy.Newman@itt.com>
> > > wrote:
> > > > I am looking for info on how to use the IndexWriter.update method.  A
> > > short example of how to add a document and then later update would
> > > > be very helpful.  I get lost because I can add a document with just
> the
> > > document, but I need a document and a Term.  I am not really sure
> > > > what a Term is since I did not use a Term to create the document nor
> do
> > I
> > > see it in any of the examples of searching/adding.
> > >
> > > When you index the document, add an ID field that is unique.  Then
> > > when you go to update the document the "Term" will be the ID of the
> > > document you wish to update.  For example, you might add a URL as the
> > > unique ID, then to update it might look something like:
> > >
> > > writer.update(new Term("id","http://apache.org/lucene/index.htm"),
> doc)
> > >
> > >
> > > --tim
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> > > This e-mail and any files transmitted with it may be proprietary and
> are
> > > intended solely for the use of the individual or entity to whom they
> are
> > > addressed. If you have received this e-mail in error please notify the
> > > sender.
> > > Please note that any views or opinions presented in this e-mail are
> > solely
> > > those of the author and do not necessarily represent those of ITT
> > > Corporation. The recipient should check this e-mail and any attachments
> > for
> > > the presence of viruses. ITT accepts no liability for any damage caused
> > by
> > > any virus transmitted by this e-mail.
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message