lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ning Li <nin...@us.ibm.com>
Subject Re: Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)
Date Tue, 09 May 2006 23:54:16 GMT
The machine is swamped with tests. I will run the experiment when the
machine is free.


Regards,
Ning


Ning Li
Search Technologies
IBM Almaden Research Center
650 Harry Road
San Jose, CA 95120



|---------+---------------------------->
|         |           Otis Gospodnetic |
|         |           <otis_gospodnetic|
|         |           @yahoo.com>      |
|         |                            |
|         |           05/09/2006 07:30 |
|         |           AM               |
|         |           Please respond to|
|         |           java-dev         |
|---------+---------------------------->
  >------------------------------------------------------------------------------------------------------------------------------|
  |                                                                                      
                                       |
  |       To:       java-dev@lucene.apache.org                                           
                                       |
  |       cc:                                                                            
                                       |
  |       Subject:  Re: Supporting deleteDocuments in IndexWriter (Code and Performance Results
Provided)                        |
  >------------------------------------------------------------------------------------------------------------------------------|




I agree - a delete (typically for a Term that represents a "primary key"
for a Document in an index) followed by re-add of a Document is a very
common scenario, and I'd love to see the numbers for that.

Thanks,
Otis

> We experimented with three workloads:
>   - Insert only. 1.6M documents were inserted and the final
>     index size was 2.3GB.
>   - Insert/delete (big batches). The same documents were
>     inserted, but 25% were deleted. 1000 documents were
>     deleted for every 4000 inserted.
>   - Insert/delete (small batches). In this case, 5 documents
>     were deleted for every 20 inserted.

Thanks, these benchmarks are very important.

If you can do it, I'd love to see the results of a fourth benchmark,
which represents a typical situation (which you also mentioned)
of document updates: every single insert is preceded by a delete,
25% of which actually delete (the updated document existed previously)
and the rest end up not finding an old document and not deleting
anything. I expect this benchmark to show an even greater improvment
of your approach over the naive IndexModifier.


--
Nadav Har'El


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message