lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Newman, Billy" <Billy.New...@itt.com>
Subject RE: IndexWriter update method
Date Sat, 18 Apr 2009 03:42:23 GMT
Perfect explanation, I think I have the idea now.  Thanks so much!  I would also like to test
out the update with a term that does not have any matches to see if it will do an insert as
that would make the code much simpler and efficient.  From the documentation an update is
a delete followed by  an insert, so my guess is that you are correct.

Thanks again!

Billy
________________________________________
From: Erick Erickson [erickerickson@gmail.com]
Sent: Friday, April 17, 2009 8:07 PM
To: java-user@lucene.apache.org
Subject: Re: IndexWriter update method

What you're missing is that the example has no unique ID, it wasn't created
with update in mind.

There's no hidden magic for Lucene knowing *what* document you want
to have updated, you have to provide it yourself, and it should be unique.

Imagine a parts catalog, or an index of a directory tree. In the parts
catalog,
you could identify the document by its part number, so you'd probably index
it something like doc.add(new Field("partno", "123345",
Field.Store.whatever, Field.Index.UN_TOKENIZED);
Indexing a directory tree you could use the complete file path similarly.

Now, each document will have one (and only one) partno, and it'll be unique
(you really
don't want to tokenize this).

To update, you'd form your term on the field "partno" and value "123345",
thus uniquely
identifying the document you want replaced, and use that term in your update
statement.
Think of the Term as a unique key for the document that *you've*
deliberately put there.

I'm pretty sure (but not positive) that if you update a document where the
term doesn't
have any matches, you'll get a simple insert, but I won't guarantee it.

HTH
Erick


On Fri, Apr 17, 2009 at 9:28 PM, Newman, Billy <Billy.Newman@itt.com> wrote:

> Ok I am still confused.
>
> Looking at the examples to index a document I would do something like the
> following:
>        Document document = new Document();
>        document.add(Field.UnStored("article", article));
>        document.add(Field.Text("comments", comments));
>        Analyzer analyzer  = new StandardAnalyzer();
>        IndexWriter writer = new IndexWriter(indexDirectory, analyzer,
> false);
>        writer.addDocument(document);
>        writer.optimize();
>        writer.close();
>
> Now lets say that the comments can change and when they do I want to update
> that document to contain the newly updated comments.
>
> So I would have to go back and check my index to see if that book already
> exists.
> Query q = new QueryParser("article", analyzer).parse(querystr);
> int hitsPerPage = 10;
> IndexSearcher searcher = new IndexSearcher(index);
> TopDocCollector collector = new TopDocCollector(hitsPerPage);
> searcher.search(q, collector);
> ScoreDoc[] hits = collector.topDocs().scoreDocs;
> if (hits!= null && hits.length > 0) {
>  // ?
>  // Then this already exists and I just want to update the comments section
> }
>
> Does that make sense?  Am I going about this wrong?
>
> Billy
>
>
> ________________________________________
> From: Tim Williams [williamstw@gmail.com]
> Sent: Friday, April 17, 2009 6:05 PM
> To: java-user@lucene.apache.org
> Subject: Re: IndexWriter update method
>
> On Fri, Apr 17, 2009 at 7:27 PM, Newman, Billy <Billy.Newman@itt.com>
> wrote:
> > I am looking for info on how to use the IndexWriter.update method.  A
> short example of how to add a document and then later update would
> > be very helpful.  I get lost because I can add a document with just the
> document, but I need a document and a Term.  I am not really sure
> > what a Term is since I did not use a Term to create the document nor do I
> see it in any of the examples of searching/adding.
>
> When you index the document, add an ID field that is unique.  Then
> when you go to update the document the "Term" will be the ID of the
> document you wish to update.  For example, you might add a URL as the
> unique ID, then to update it might look something like:
>
> writer.update(new Term("id","http://apache.org/lucene/index.htm"), doc)
>
>
> --tim
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> This e-mail and any files transmitted with it may be proprietary and are
> intended solely for the use of the individual or entity to whom they are
> addressed. If you have received this e-mail in error please notify the
> sender.
> Please note that any views or opinions presented in this e-mail are solely
> those of the author and do not necessarily represent those of ITT
> Corporation. The recipient should check this e-mail and any attachments for
> the presence of viruses. ITT accepts no liability for any damage caused by
> any virus transmitted by this e-mail.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message