lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ning Li <nin...@us.ibm.com>
Subject Re: Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)
Date Thu, 11 May 2006 19:19:34 GMT
The fourth workload:
  - Upsert. Every operation is a delete followed by an insert.
    75% of the deletes do not match any document already
    inserted. 25% of the deletes match some document inserted.

The new IndexWriter took 136min. The current IndexModifier has
been running for 18 hours and hasn't finished...


For your convenience, here are the performance results for the
first three workloads again.

                                current       current          new
Workload                      IndexWriter  IndexModifier   IndexWriter
-----------------------------------------------------------------------
Insert only                     116 min       119 min        116 min
Insert/delete (big batches)       --          135 min        125 min
Insert/delete (small batches)     --          338 min        134 min


Regards,
Ning


Ning Li
Search Technologies
IBM Almaden Research Center
650 Harry Road
San Jose, CA 95120



|---------+---------------------------->
|         |           Ning             |
|         |           Li/Almaden/IBM@IB|
|         |           MUS              |
|         |                            |
|         |           05/09/2006 04:54 |
|         |           PM               |
|         |           Please respond to|
|         |           java-dev         |
|---------+---------------------------->
  >------------------------------------------------------------------------------------------------------------------------------|
  |                                                                                      
                                       |
  |       To:       java-dev@lucene.apache.org                                           
                                       |
  |       cc:                                                                            
                                       |
  |       Subject:  Re: Supporting deleteDocuments in IndexWriter (Code and Performance Results
Provided)                        |
  >------------------------------------------------------------------------------------------------------------------------------|




The machine is swamped with tests. I will run the experiment when the
machine is free.


Regards,
Ning


Ning Li
Search Technologies
IBM Almaden Research Center
650 Harry Road
San Jose, CA 95120



|---------+---------------------------->
|         |           Otis Gospodnetic |
|         |           <otis_gospodnetic|
|         |           @yahoo.com>      |
|         |                            |
|         |           05/09/2006 07:30 |
|         |           AM               |
|         |           Please respond to|
|         |           java-dev         |
|---------+---------------------------->

>------------------------------------------------------------------------------------------------------------------------------|

  |
|
  |       To:       java-dev@lucene.apache.org
|
  |       cc:
|
  |       Subject:  Re: Supporting deleteDocuments in IndexWriter (Code and
Performance Results Provided)                        |

>------------------------------------------------------------------------------------------------------------------------------|





I agree - a delete (typically for a Term that represents a "primary key"
for a Document in an index) followed by re-add of a Document is a very
common scenario, and I'd love to see the numbers for that.

Thanks,
Otis

> We experimented with three workloads:
>   - Insert only. 1.6M documents were inserted and the final
>     index size was 2.3GB.
>   - Insert/delete (big batches). The same documents were
>     inserted, but 25% were deleted. 1000 documents were
>     deleted for every 4000 inserted.
>   - Insert/delete (small batches). In this case, 5 documents
>     were deleted for every 20 inserted.

Thanks, these benchmarks are very important.

If you can do it, I'd love to see the results of a fourth benchmark,
which represents a typical situation (which you also mentioned)
of document updates: every single insert is preceded by a delete,
25% of which actually delete (the updated document existed previously)
and the rest end up not finding an old document and not deleting
anything. I expect this benchmark to show an even greater improvment
of your approach over the naive IndexModifier.


--
Nadav Har'El


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message