lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christian Moen (Issue Comment Edited) (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (SOLR-3282) Perform Kuromoji/Japanese stability test before 3.6 freeze
Date Tue, 27 Mar 2012 17:26:27 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239659#comment-13239659
] 

Christian Moen edited comment on SOLR-3282 at 3/27/12 5:25 PM:
---------------------------------------------------------------

h3. Test 3 - Searching with highlighting (no indexing)

The test is similar to _Test 2_ with highlighting turned on, but only ~62,000 queries were
run.  No indexing was done.

Solr was run as follows

{noformat}
java -verbose:gc -Dfile.encoding=UTF-8 -jar start.jar
{noformat}

and - again - notice a small heap size and regular GC options.

The queries are on the form

{noformat}
/solr/select/?q=%E7%84%A1%E6%96%99%E5%8D%A0%E3%81%84&hl=on&hl.fl=body
{noformat}

which is

{noformat}
/solr/select/?q=無料占い&hl=on&hl.fl=body
{noformat}

in unquoted form.

We have turned on highlighting and we are highlighting on the body field.

The test completes in 1648.1 seconds and 63200 queries were run and the sustainable query
rate was 47 QPS.

Turning on highlighting has a fairly significant performance penalty if we compare QPS to
the non-highlighting case where we could sustain 142 QPS.

There is also increased memory pressure with highlighting turned on.  There were 652 Full
GC events in total in the period and the longest Full GC times is given below. 

|| Longest Full GC times (seconds) ||
|0.9769069|
|0.8564934|
|0.7585956|
|0.7084318|
|0.6928327|
|0.6781336|
|0.6358398|
|0.6099899|
|0.5628532|
|0.5540237|
|0.5443075|
|0.5429399|
|0.5423989|
|...|

The extra memory pressure can also be seen in the VisualVM screenshot.  I believe the root
cause of this is the highlighting.

|| Attachment || Description ||
| 62k-queries-highlight-gc.log|  GC log |
| 62k-queries-highlight-visualvm.png|  Screenshot from VisualVM |
                
      was (Author: cm):
    h5. Test 3 - Searching with highlighting (no indexing)

The test is similar to _Test 2_ with highlighting turned on, but only ~62,000 queries were
run.  No indexing was done.

Solr was run as follows

{noformat}
java -verbose:gc -Dfile.encoding=UTF-8 -jar start.jar
{noformat}

and - again - notice a small heap size and regular GC options.

The queries are on the form

{noformat}
/solr/select/?q=%E7%84%A1%E6%96%99%E5%8D%A0%E3%81%84&hl=on&hl.fl=body
{noformat}

which is

{noformat}
/solr/select/?q=無料占い&hl=on&hl.fl=body
{noformat}

in unquoted form.

We have turned on highlighting and we are highlighting on the body field.

The test completes in 1648.1 seconds and 63200 queries were run and the sustainable query
rate was 47 QPS.

Turning on highlighting has a fairly significant performance penalty if we compare QPS to
the non-highlighting case where we could sustain 142 QPS.

There is also increased memory pressure with highlighting turned on.  There were 652 Full
GC events in total in the period and the longest Full GC times is given below. 

|| Longest Full GC times (seconds) ||
|0.9769069|
|0.8564934|
|0.7585956|
|0.7084318|
|0.6928327|
|0.6781336|
|0.6358398|
|0.6099899|
|0.5628532|
|0.5540237|
|0.5443075|
|0.5429399|
|0.5423989|
|...|

The extra memory pressure can also be seen in the VisualVM screenshot.  I believe the root
cause of this is the highlighting.

|| Attachment || Description ||
| 62k-queries-highlight-gc.log|  GC log |
| 62k-queries-highlight-visualvm.png|  Screenshot from VisualVM |
                  
> Perform Kuromoji/Japanese stability test before 3.6 freeze
> ----------------------------------------------------------
>
>                 Key: SOLR-3282
>                 URL: https://issues.apache.org/jira/browse/SOLR-3282
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>    Affects Versions: 3.6, 4.0
>            Reporter: Christian Moen
>            Assignee: Christian Moen
>         Attachments: 250k-queries-no-highlight-gc.log, 250k-queries-no-highlight-visualvm.png,
62k-queries-highlight-gc.log, 62k-queries-highlight-visualvm.png, jawiki-index-gc.log, jawiki-index-gcviewer.png,
jawiki-index-visualvm.png
>
>
> Kuromoji might be used by many and also in mission critical systems.  I'd like to run
a stability test before we freeze 3.6.
> My thinking is to test the out-of-the-box configuration using fieldtype {{text_ja}} as
follows:
> # Index all of Japanese Wikipedia documents (approx. 1.4M documents) in a never ending
loop
> # Simultaneously run many tens of thousands typical Japanese queries against the index
at 3-5 queries per second with highlighting turned on
> While Solr is indexing and searching, I'd like to verify that:
> * Indexing and queries are working as expected
> * Memory and heap usage looks stable over time
> * Garbage collection is overall low over time -- no Full-GC issues
> I'll post findings and results to this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message