lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christian Moen (Issue Comment Edited) (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze
Date Tue, 27 Mar 2012 17:22:27 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239597#comment-13239597
] 

Christian Moen edited comment on SOLR-3282 at 3/27/12 5:21 PM:
---------------------------------------------------------------

h3. Test 2 - Searching without highlighting (no indexing)

After the Wikipedia index was build, I've ran 250,000 fairly common Japanese queries against
the index without highlighting and by using simple means.

For this test, I was running Java using

{noformat}
java -verbose:gc -Dfile.encoding=UTF-8 -jar start.jar
{noformat}

so - small/normal heap size to keep memory pressure a bit high and no fancy GC options --
and all of Wikipedia searchable.  Very nice :)

The queries are on the form

{noformat}
/solr/select/?q=%E7%84%A1%E6%96%99%E5%8D%A0%E3%81%84
{noformat}

which is

{noformat}
/solr/select/?q=無料占い
{noformat}

in plain unquoted form.

Running the 250,000 queries took 1838.5 seconds and the test was roughly able to keep 80%
of its queries within 0.5 second latency and serve a sustained load of 142 QPS.

The GC logs have some Full GC entries in them:

|| GC Activity || Time || 
| Full GC 57558K->36262K(126912K) | 0.2926001 secs |
| Full GC 120759K->37151K(126912K) | 0.2948184 secs |
| Full GC 118817K->38305K(126912K) | 0.3726583 secs |
| Full GC 116992K->40203K(126912K) | 0.3688027 secs |
| Full GC 119572K->39070K(126912K) | 0.2896587 secs |
| Full GC 121476K->39257K(126912K) | 0.3034882 secs |
| Full GC 119659K->39451K(126912K) | 0.3078915 secs |
| Full GC 116948K->39770K(126912K) | 0.2407321 secs |
| Full GC 118382K->40442K(126912K) | 0.5224920 secs |

The regular GC entries took a maximum of 0.0731031 seconds, but most half or or less.

|| Attachment || Description ||
| 250k-queries-no-highlight-gc.log | Screenshot from GCViewer |
| 250k-queries-no-highlight-visualvm.png | Screenshot from VisualVM |

GCViewer seems to have problems parsing the 250k-queries-no-highlight-gc.log so I'm not attaching
a screenshot for this.
                
      was (Author: cm):
    h5. Test 2: Searching without highlighting (no indexing)

After the Wikipedia index was build, I've ran 250,000 fairly common Japanese queries against
the index without highlighting and by using simple means.

For this test, I was running Java using

{noformat}
java -verbose:gc -Dfile.encoding=UTF-8 -jar start.jar
{noformat}

so - small/normal heap size to keep memory pressure a bit high and no fancy GC options --
and all of Wikipedia searchable.  Very nice :)

The queries are on the form

{noformat}
/solr/select/?q=%E7%84%A1%E6%96%99%E5%8D%A0%E3%81%84
{noformat}

which is

{noformat}
/solr/select/?q=無料占い
{noformat}

in plain unquoted form.

Running the 250,000 queries took 1838.5 seconds and the test was roughly able to keep 80%
of its queries within 0.5 second latency and serve a sustained load of 142 QPS.

The GC logs have some Full GC entries in them:

|| GC Activity || Time || 
| Full GC 57558K->36262K(126912K) | 0.2926001 secs |
| Full GC 120759K->37151K(126912K) | 0.2948184 secs |
| Full GC 118817K->38305K(126912K) | 0.3726583 secs |
| Full GC 116992K->40203K(126912K) | 0.3688027 secs |
| Full GC 119572K->39070K(126912K) | 0.2896587 secs |
| Full GC 121476K->39257K(126912K) | 0.3034882 secs |
| Full GC 119659K->39451K(126912K) | 0.3078915 secs |
| Full GC 116948K->39770K(126912K) | 0.2407321 secs |
| Full GC 118382K->40442K(126912K) | 0.5224920 secs |

The regular GC entries took a maximum of 0.0731031 seconds, but most half or or less.

|| Filename || Description ||
| 250k-queries-no-highlight-gc.log | Screenshot from GCViewer |
| 250k-queries-no-highlight-visualvm.png | Screenshot from VisualVM |

GCViewer seems to have problems parsing the 250k-queries-no-highlight-gc.log so I'm not attaching
a screenshot for this.
                  
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>         Attachments: 250k-queries-no-highlight-gc.log, 250k-queries-no-highlight-visualvm.png,
62k-queries-highlight-gc.log, 62k-queries-highlight-visualvm.png, jawiki-index-gc.log, jawiki-index-gcviewer.png,
jawiki-index-visualvm.png
>
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run
a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as
follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending
loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index
at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message