lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sergiu gordea <gser...@ifit.uni-klu.ac.at>
Subject Re: Duplicate Hits
Date Tue, 01 Feb 2005 17:30:31 GMT
Erik Hatcher wrote:

>
> On Feb 1, 2005, at 10:51 AM, Jerry Jalenak wrote:
>
>> OK - but I'm dealing with indexing between 1.5 and 2 million 
>> documents, so I
>> really don't want to 'batch' them up if I can avoid it.  And I also 
>> don't
>> think I can keep an IndexRead open to the index at the same time I 
>> have an
>> IndexWriter open.  I may have to try and deal with this issue through 
>> some
>> sort of filter on the query side, provided it doesn't impact 
>> performance to
>> much.
>
>
> You can use an IndexReader and IndexWriter at the same time (the 
> caveat is that you cannot delete with the IndexReader at the same time 
> you're writing with an IndexWriter).  Is there no other identifying 
> information, though, on the incoming documents with a date stamp?  
> Identifier?  Or something unique you can go on?
>
>     Erik

As Erick suggested earlier, I think that keeping the information in the 
database and indentifying the new entries at database level is a better 
approach.
Indexing documents and optimizing the index on a that big index will be 
very time consuming information.
Also .. consider that in the future you would like to modify the 
structure of your index.

Think how much effort will be to split some fields in a few smaller 
parts. Or just to change the format of a field,
let's say you have a date in DDMMYY format and you need to change to 
YYYYMMDD.

And consider how much effort is needed to rebuild a completly new index 
from the database....

 Of course, your requirements may not ask to have the information stored 
in the database, and ... it is up to you to use a DB + Lucene index,

or just a Lucene index.


 Best,

 Sergiu

>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message