lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Newman, Billy" <Billy.New...@itt.com>
Subject RE: IndexWriter update method
Date Mon, 20 Apr 2009 15:40:19 GMT
What if you're unique id is a composite of two field when you create the document?

I.E.
doc.add(new Field("partno", "123345",
Field.Store.whatever, Field.Index.UN_TOKENIZED);
doc.add(new Field("storeLoc", "Springfield",
Field.Store.whatever, Field.Index.UN_TOKENIZED);

How do you create a Term for this?  Is this possible?  Is this the correct way to create a
document that has two fields?  If so I am a little lost on how to create a term to correctly
find this.

Thanks,
Billy

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Friday, April 17, 2009 8:08 PM
To: java-user@lucene.apache.org
Subject: Re: IndexWriter update method

What you're missing is that the example has no unique ID, it wasn't created
with update in mind.

There's no hidden magic for Lucene knowing *what* document you want
to have updated, you have to provide it yourself, and it should be unique.

Imagine a parts catalog, or an index of a directory tree. In the parts
catalog,
you could identify the document by its part number, so you'd probably index
it something like doc.add(new Field("partno", "123345",
Field.Store.whatever, Field.Index.UN_TOKENIZED);
Indexing a directory tree you could use the complete file path similarly.

Now, each document will have one (and only one) partno, and it'll be unique
(you really
don't want to tokenize this).

To update, you'd form your term on the field "partno" and value "123345",
thus uniquely
identifying the document you want replaced, and use that term in your update
statement.
Think of the Term as a unique key for the document that *you've*
deliberately put there.

I'm pretty sure (but not positive) that if you update a document where the
term doesn't
have any matches, you'll get a simple insert, but I won't guarantee it.

HTH
Erick


On Fri, Apr 17, 2009 at 9:28 PM, Newman, Billy <Billy.Newman@itt.com> wrote:

> Ok I am still confused.
>
> Looking at the examples to index a document I would do something like the
> following:
>        Document document = new Document();
>        document.add(Field.UnStored("article", article));
>        document.add(Field.Text("comments", comments));
>        Analyzer analyzer  = new StandardAnalyzer();
>        IndexWriter writer = new IndexWriter(indexDirectory, analyzer,
> false);
>        writer.addDocument(document);
>        writer.optimize();
>        writer.close();
>
> Now lets say that the comments can change and when they do I want to update
> that document to contain the newly updated comments.
>
> So I would have to go back and check my index to see if that book already
> exists.
> Query q = new QueryParser("article", analyzer).parse(querystr);
> int hitsPerPage = 10;
> IndexSearcher searcher = new IndexSearcher(index);
> TopDocCollector collector = new TopDocCollector(hitsPerPage);
> searcher.search(q, collector);
> ScoreDoc[] hits = collector.topDocs().scoreDocs;
> if (hits!= null && hits.length > 0) {
>  // ?
>  // Then this already exists and I just want to update the comments section
> }
>
> Does that make sense?  Am I going about this wrong?
>
> Billy
>
>
> ________________________________________
> From: Tim Williams [williamstw@gmail.com]
> Sent: Friday, April 17, 2009 6:05 PM
> To: java-user@lucene.apache.org
> Subject: Re: IndexWriter update method
>
> On Fri, Apr 17, 2009 at 7:27 PM, Newman, Billy <Billy.Newman@itt.com>
> wrote:
> > I am looking for info on how to use the IndexWriter.update method.  A
> short example of how to add a document and then later update would
> > be very helpful.  I get lost because I can add a document with just the
> document, but I need a document and a Term.  I am not really sure
> > what a Term is since I did not use a Term to create the document nor do I
> see it in any of the examples of searching/adding.
>
> When you index the document, add an ID field that is unique.  Then
> when you go to update the document the "Term" will be the ID of the
> document you wish to update.  For example, you might add a URL as the
> unique ID, then to update it might look something like:
>
> writer.update(new Term("id","http://apache.org/lucene/index.htm"), doc)
>
>
> --tim
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> This e-mail and any files transmitted with it may be proprietary and are
> intended solely for the use of the individual or entity to whom they are
> addressed. If you have received this e-mail in error please notify the
> sender.
> Please note that any views or opinions presented in this e-mail are solely
> those of the author and do not necessarily represent those of ITT
> Corporation. The recipient should check this e-mail and any attachments for
> the presence of viruses. ITT accepts no liability for any damage caused by
> any virus transmitted by this e-mail.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message