lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (LUCENE-2380) Add FieldCache.getTermBytes, to load term data as byte[]
Date Sat, 19 Jun 2010 13:56:25 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880483#action_12880483
] 

Yonik Seeley edited comment on LUCENE-2380 at 6/19/10 9:55 AM:
---------------------------------------------------------------

It was really tricky performance testing this.

If I started solr and tested one type of faceting exclusively, the performance impact of going
through the new FieldCache interfaces (PackedInts for ord lookup) was relatively minimal.

However, I had a simple script that tested the different variants (the 4 in the table above)...
and using that resulted in the bigger slowdowns.

The script would do the following:
{code}
1) test 100 iterations of facet.method=fc on the 100,000 term field
2) test 10 iterations of facet.method=fcs on the 100,000 term field
3) test 100 iterations of facet.method=fc on the 100 term field
4) test 10 iterations of facet.method=fcs on the 100 term field
{code}

I would run the script a few times, making sure the numbers stabilized and were repeatable.

Testing #1 alone resulted in trunk slowing down ~ 4%
Testing #1 along with any single other test: same small slowdown of ~4%
Running the complete script: slowdown of 33-38% for #1 (as well as others)
When running the complete script, the first run of Test #1 was always the best... as if the
JVM correctly specialized it, but then discarded it later, never to return.
I saw the same affect on both an AMD Phenom II w/ ubuntu, Java 1.6_14 and Win7 with a Core2,
Java 1.6_17, both 64 bit.  The drop on Win7 was only 20% though.

So: you can't always depend on the JVM being able to inline stuff for you, and it seems very
hard to determine when it can.
This obviously has implications for the lucene benchmarker too.


      was (Author: yseeley@gmail.com):
    It was really tricky performance testing this.

If I started solr and tested one type of faceting exclusively, the performance impact of going
through the new FieldCache interfaces (PackedInts for ord lookup) was relatively minimal.

However, I had a simple script that tested the different variants (the 4 in the table above)...
and using that resulted in the bigger slowdowns.

The script would do the following:
{code}
1) test 100 iterations of facet.method=fc on the 100,000 term field
2) test 10 iterations of facet.method=fcs on the 100,000 term field
3) test 100 iterations of facet.method=fc on the 100 term field
4) test 10 iterations of facet.method=fcs on the 100 term field
{code}

I would run the script a few times, making sure the numbers stabilized and were repeatable.

Testing #1 alone resulted in trunk slowing down ~ 4%
Testing #1 along with any single other test: same small slowdown of ~4%
Running the complete script: slowdown of 33-38% for #1 (as well as others)
When running the complete script, the first run of Test #1 was always the best... as if the
JVM correctly specialized it, but then discarded it later, never to return.

So: you can't always depend on the JVM being able to inline stuff for you, and it seems very
hard to determine when it can.
This obviously has implications for the lucene benchmarker too.

  
> Add FieldCache.getTermBytes, to load term data as byte[]
> --------------------------------------------------------
>
>                 Key: LUCENE-2380
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2380
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.0
>
>         Attachments: LUCENE-2380.patch, LUCENE-2380.patch, LUCENE-2380.patch, LUCENE-2380.patch,
LUCENE-2380_direct_arr_access.patch, LUCENE-2380_enum.patch, LUCENE-2380_enum.patch
>
>
> With flex, a term is now an opaque byte[] (typically, utf8 encoded unicode string, but
not necessarily), so we need to push this up the search stack.
> FieldCache now has getStrings and getStringIndex; we need corresponding methods to load
terms as native byte[], since in general they may not be representable as String.  This should
be quite a bit more RAM efficient too, for US ascii content since each character would then
use 1 byte not 2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message