lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: IndexWriter update method
Date Mon, 20 Apr 2009 23:28:22 GMT
I don't think you *can* create a Term that spans two fields. Perhaps
you'd be better off just doing a search, getting the doc ID back then
adding a new version of the document.

You *could* think about reindexing your corpus and indexing an
additional field that was the concatenation of the two fields you
want to update by if that's a better solution.

Best
Erick

On Mon, Apr 20, 2009 at 11:40 AM, Newman, Billy <Billy.Newman@itt.com>wrote:

> What if you're unique id is a composite of two field when you create the
> document?
>
> I.E.
> doc.add(new Field("partno", "123345",
> Field.Store.whatever, Field.Index.UN_TOKENIZED);
> doc.add(new Field("storeLoc", "Springfield",
> Field.Store.whatever, Field.Index.UN_TOKENIZED);
>
> How do you create a Term for this?  Is this possible?  Is this the correct
> way to create a document that has two fields?  If so I am a little lost on
> how to create a term to correctly find this.
>
> Thanks,
> Billy
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Friday, April 17, 2009 8:08 PM
> To: java-user@lucene.apache.org
> Subject: Re: IndexWriter update method
>
> What you're missing is that the example has no unique ID, it wasn't created
> with update in mind.
>
> There's no hidden magic for Lucene knowing *what* document you want
> to have updated, you have to provide it yourself, and it should be unique.
>
> Imagine a parts catalog, or an index of a directory tree. In the parts
> catalog,
> you could identify the document by its part number, so you'd probably index
> it something like doc.add(new Field("partno", "123345",
> Field.Store.whatever, Field.Index.UN_TOKENIZED);
> Indexing a directory tree you could use the complete file path similarly.
>
> Now, each document will have one (and only one) partno, and it'll be unique
> (you really
> don't want to tokenize this).
>
> To update, you'd form your term on the field "partno" and value "123345",
> thus uniquely
> identifying the document you want replaced, and use that term in your
> update
> statement.
> Think of the Term as a unique key for the document that *you've*
> deliberately put there.
>
> I'm pretty sure (but not positive) that if you update a document where the
> term doesn't
> have any matches, you'll get a simple insert, but I won't guarantee it.
>
> HTH
> Erick
>
>
> On Fri, Apr 17, 2009 at 9:28 PM, Newman, Billy <Billy.Newman@itt.com>
> wrote:
>
> > Ok I am still confused.
> >
> > Looking at the examples to index a document I would do something like the
> > following:
> >        Document document = new Document();
> >        document.add(Field.UnStored("article", article));
> >        document.add(Field.Text("comments", comments));
> >        Analyzer analyzer  = new StandardAnalyzer();
> >        IndexWriter writer = new IndexWriter(indexDirectory, analyzer,
> > false);
> >        writer.addDocument(document);
> >        writer.optimize();
> >        writer.close();
> >
> > Now lets say that the comments can change and when they do I want to
> update
> > that document to contain the newly updated comments.
> >
> > So I would have to go back and check my index to see if that book already
> > exists.
> > Query q = new QueryParser("article", analyzer).parse(querystr);
> > int hitsPerPage = 10;
> > IndexSearcher searcher = new IndexSearcher(index);
> > TopDocCollector collector = new TopDocCollector(hitsPerPage);
> > searcher.search(q, collector);
> > ScoreDoc[] hits = collector.topDocs().scoreDocs;
> > if (hits!= null && hits.length > 0) {
> >  // ?
> >  // Then this already exists and I just want to update the comments
> section
> > }
> >
> > Does that make sense?  Am I going about this wrong?
> >
> > Billy
> >
> >
> > ________________________________________
> > From: Tim Williams [williamstw@gmail.com]
> > Sent: Friday, April 17, 2009 6:05 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: IndexWriter update method
> >
> > On Fri, Apr 17, 2009 at 7:27 PM, Newman, Billy <Billy.Newman@itt.com>
> > wrote:
> > > I am looking for info on how to use the IndexWriter.update method.  A
> > short example of how to add a document and then later update would
> > > be very helpful.  I get lost because I can add a document with just the
> > document, but I need a document and a Term.  I am not really sure
> > > what a Term is since I did not use a Term to create the document nor do
> I
> > see it in any of the examples of searching/adding.
> >
> > When you index the document, add an ID field that is unique.  Then
> > when you go to update the document the "Term" will be the ID of the
> > document you wish to update.  For example, you might add a URL as the
> > unique ID, then to update it might look something like:
> >
> > writer.update(new Term("id","http://apache.org/lucene/index.htm"), doc)
> >
> >
> > --tim
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> > This e-mail and any files transmitted with it may be proprietary and are
> > intended solely for the use of the individual or entity to whom they are
> > addressed. If you have received this e-mail in error please notify the
> > sender.
> > Please note that any views or opinions presented in this e-mail are
> solely
> > those of the author and do not necessarily represent those of ITT
> > Corporation. The recipient should check this e-mail and any attachments
> for
> > the presence of viruses. ITT accepts no liability for any damage caused
> by
> > any virus transmitted by this e-mail.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message