lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bernhard Messer <Bernhard.Mes...@intrafind.de>
Subject Re: Advanced timestamp usage (or global value storage)
Date Wed, 25 Aug 2004 15:39:48 GMT
Avi,

i would prefer the second approach. If you already store the date time 
when the doc was index, you could use the following trick to get the 
last document added to the index:

            IndexReader ir = IndexReader.open("/tmp/testindex");
          
            int maxDoc = ir.maxDoc();
            while (--maxDoc > 0) {
              if (!ir.isDeleted(maxDoc)) {
                Document doc = ir.document(maxDoc);
                System.out.println(doc.getField("indexDate"));
                break;
              }
            }

What do you think about the implementation, no extra properties, nothing 
to worry about. Every information is within you index.

regards
Bernhard

Avi Drissman wrote:

> I've used Lucene for a long time, but only in the most basic way. I 
> have a custom analyzer and a slightly hacked query parser, but in 
> general it's the basic add document/remove document/query documents 
> cycle.
>
> In my system, I'm indexing a store of external documents, maintaining 
> an index for full-text querying. However, I might be turned off when 
> documents are added, and then when I'm restarted, I'm going to need to 
> determine the timestamp of the last document added to the index so 
> that I can pick up where I left off.
>
> There are three approaches to doing this, two using Lucene. I don't 
> know how I would do the two Lucene approaches, or even if they're 
> possible.
>
> 1. Just keep a file in parallel with the index, reading and writing 
> the timestamp of the last indexed document in it. I know how to do 
> this, but I don't like the idea of keeping a separate file.
>
> 2. Drop a timestamp onto each document as it's indexed. I've attached 
> timestamp fields to documents in the past so that I could do range 
> queries on them. However, I don't know how to do a query like "the 
> document with the latest timestamp" or even if that's possible.
>
> 3. Create a dummy document (with some unique field identifier so you 
> could quickly query for it) with a field "last timestamp". This is a 
> "global value storage" approach, as you could just store any field 
> with any value on it. But I'd be updating this timestamp field a lot, 
> which means that every time I updated the index I'd have to remove 
> this special document and reindex it. Is there any way to update the 
> value of a field in a document directly in the index without removing 
> and adding it again to the index? The field I'd want to update would 
> just be stored, not indexed or tokenized.
>
> Thanks for your help in guiding my exploration into the capabilities 
> of Lucene.
>
> Avi
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message