lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-2111) Wrapup flexible indexing
Date Sat, 27 Mar 2010 09:38:27 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael McCandless updated LUCENE-2111:
---------------------------------------

    Attachment: flexBench.py
                benchUtil.py

I'm benchmarking flex vs trunk, but uncovered a strange performance loss with WildcardQuery.
 I'm attaching the python wrapper around contrib/benchmark that I'm using.  Hopefully this
is something silly...

You have to edit flexBench.py, specificaly the TRUNK_DIR and FLEX_DIR must point to the .../contrib/benchmark
of each source area, and you have to edit the WIKI_LINE_FILE and/or WIKI_FILE (I think WIKI_LINE_FILE
can be None in which case it should (but I haven't tested recently!) fallback to parsing the
.xml.bz2 wikipedia export).

I'll first build an index of the first 5M wikipedia docs, once for flex and once for trunk,
and then run the test queries.  It also tests the "flex API on trunk index" case, to test
perf of the flex emulation layer... this layer is looking a bit slowish now but I'm not sure
how much we can do to speed it up...

Run like this:
{code}
python -u flexBench.py -run test
{code}

I have it set to only test only the wildcard query uni*t right now... and I'm getting this
result:
{code}
JAVA:
java version "1.6.0_17"
Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
Java HotSpot(TM) 64-Bit Server VM (build 14.3-b01, mixed mode)


OS:
Linux centos 2.6.18-164.6.1.el5 #1 SMP Tue Nov 3 16:12:36 EST 2009 x86_64 x86_64 x86_64 GNU/Linux

Index /x/lucene/flex.work.wiki.nd5M already exists...
Index /x/lucene/trunk.work.wiki.nd5M already exists...
Index /x/lucene/flex.work.random.nd5M already exists...
Index /x/lucene/trunk.work.random.nd5M already exists...

RUN: source=wiki query=un*t sort=None
  run trunk...
    cd /root/src/clean/lucene/contrib/benchmark
    log: /root/src/clean/lucene/contrib/benchmark/logs/trunk.0
    62.49 QPS
  run flex on trunk index...
    cd /root/src/flex.clean/contrib/benchmark
    log: /root/src/flex.clean/contrib/benchmark/logs/flexOnTrunk.1
    25.87 QPS [-58.6% worse]
  run flex on flex index...
    cd /root/src/flex.clean/contrib/benchmark
    log: /root/src/flex.clean/contrib/benchmark/logs/flexOnFlex.2
    39.30 QPS [-37.1% worse]
  124623 hits
{code}

Other queries I've tested look OK so far...

> Wrapup flexible indexing
> ------------------------
>
>                 Key: LUCENE-2111
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2111
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Flex Branch
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.1
>
>         Attachments: benchUtil.py, flex_backwards_merge_912395.patch, flex_merge_916543.patch,
flexBench.py, LUCENE-2111-EmptyTermsEnum.patch, LUCENE-2111-EmptyTermsEnum.patch, LUCENE-2111.patch,
LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch,
LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch,
LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch,
LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111_bytesRef.patch, LUCENE-2111_experimental.patch,
LUCENE-2111_fuzzy.patch, LUCENE-2111_mtqNull.patch, LUCENE-2111_mtqTest.patch, LUCENE-2111_toString.patch
>
>
> Spinoff from LUCENE-1458.
> The flex branch is in fairly good shape -- all tests pass, initial search performance
testing looks good, it survived several visits from the Unicode policeman ;)
> But it still has a number of nocommits, could use some more scrutiny especially on the
"emulate old API on flex index" and vice/versa code paths, and still needs some more performance
testing.  I'll do these under this issue, and we should open separate issues for other self
contained fixes.
> The end is in sight!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message