lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <yo...@apache.org>
Subject Re: improve how IndexWriter uses RAM to buffer added documents
Date Tue, 03 Apr 2007 14:54:24 GMT
Wow, very nice results Mike!

-Yonik

On 4/3/07, Michael McCandless (JIRA) <jira@apache.org> wrote:
>
>     [ https://issues.apache.org/jira/browse/LUCENE-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12486335
]
>
> Michael McCandless commented on LUCENE-843:
> -------------------------------------------
>
>
> Last is the results for small docs (100 tokens = ~550 bytes plain text each):
>
>   2000000 DOCS @ ~550 bytes plain text
>   RAM = 32 MB
>   NUM THREADS = 1
>   MERGE FACTOR = 10
>
>
>     No term vectors nor stored fields
>
>       AUTOCOMMIT = true (commit whenever RAM is full)
>
>         old
>           2000000 docs in 886.7 secs
>           index size = 438M
>
>         new
>           2000000 docs in 230.5 secs
>           index size = 435M
>
>         Total Docs/sec:             old  2255.6; new  8676.4 [  284.7% faster]
>         Docs/MB @ flush:            old   128.0; new  4194.6 [ 3176.2% more]
>         Avg RAM used (MB) @ flush:  old   107.3; new    37.7 [   64.9% less]
>
>
>       AUTOCOMMIT = false (commit only once at the end)
>
>         old
>           2000000 docs in 888.7 secs
>           index size = 438M
>
>         new
>           2000000 docs in 239.6 secs
>           index size = 432M
>
>         Total Docs/sec:             old  2250.5; new  8348.7 [  271.0% faster]
>         Docs/MB @ flush:            old   128.0; new  4146.8 [ 3138.9% more]
>         Avg RAM used (MB) @ flush:  old   108.1; new    38.9 [   64.0% less]
>
>
>
>     With term vectors (positions + offsets) and 2 small stored fields
>
>       AUTOCOMMIT = true (commit whenever RAM is full)
>
>         old
>           2000000 docs in 1480.1 secs
>           index size = 2.1G
>
>         new
>           2000000 docs in 462.0 secs
>           index size = 2.1G
>
>         Total Docs/sec:             old  1351.2; new  4329.3 [  220.4% faster]
>         Docs/MB @ flush:            old    93.1; new  4194.6 [ 4405.7% more]
>         Avg RAM used (MB) @ flush:  old   296.4; new    38.3 [   87.1% less]
>
>
>       AUTOCOMMIT = false (commit only once at the end)
>
>         old
>           2000000 docs in 1489.4 secs
>           index size = 2.1G
>
>         new
>           2000000 docs in 347.9 secs
>           index size = 2.1G
>
>         Total Docs/sec:             old  1342.8; new  5749.4 [  328.2% faster]
>         Docs/MB @ flush:            old    93.1; new  4146.8 [ 4354.5% more]
>         Avg RAM used (MB) @ flush:  old   297.1; new    38.6 [   87.0% less]
>
>
>
>   200000 DOCS @ ~5,500 bytes plain text
>
>
>     No term vectors nor stored fields
>
>       AUTOCOMMIT = true (commit whenever RAM is full)
>
>         old
>           200000 docs in 397.6 secs
>           index size = 415M
>
>         new
>           200000 docs in 167.5 secs
>           index size = 411M
>
>         Total Docs/sec:             old   503.1; new  1194.1 [  137.3% faster]
>         Docs/MB @ flush:            old    81.6; new   406.2 [  397.6% more]
>         Avg RAM used (MB) @ flush:  old    87.3; new    35.2 [   59.7% less]
>
>
>       AUTOCOMMIT = false (commit only once at the end)
>
>         old
>           200000 docs in 394.6 secs
>           index size = 415M
>
>         new
>           200000 docs in 168.4 secs
>           index size = 408M
>
>         Total Docs/sec:             old   506.9; new  1187.7 [  134.3% faster]
>         Docs/MB @ flush:            old    81.6; new   432.2 [  429.4% more]
>         Avg RAM used (MB) @ flush:  old   126.6; new    36.9 [   70.8% less]
>
>
>
>     With term vectors (positions + offsets) and 2 small stored fields
>
>       AUTOCOMMIT = true (commit whenever RAM is full)
>
>         old
>           200000 docs in 754.2 secs
>           index size = 1.7G
>
>         new
>           200000 docs in 304.9 secs
>           index size = 1.7G
>
>         Total Docs/sec:             old   265.2; new   656.0 [  147.4% faster]
>         Docs/MB @ flush:            old    46.7; new   406.2 [  769.6% more]
>         Avg RAM used (MB) @ flush:  old    92.9; new    35.2 [   62.1% less]
>
>
>       AUTOCOMMIT = false (commit only once at the end)
>
>         old
>           200000 docs in 743.9 secs
>           index size = 1.7G
>
>         new
>           200000 docs in 244.3 secs
>           index size = 1.7G
>
>         Total Docs/sec:             old   268.9; new   818.7 [  204.5% faster]
>         Docs/MB @ flush:            old    46.7; new   432.2 [  825.2% more]
>         Avg RAM used (MB) @ flush:  old    93.0; new    36.6 [   60.6% less]

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message