Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 40227 invoked from network); 27 Mar 2010 09:38:51 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 27 Mar 2010 09:38:51 -0000 Received: (qmail 54222 invoked by uid 500); 27 Mar 2010 09:38:50 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 54145 invoked by uid 500); 27 Mar 2010 09:38:50 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 54138 invoked by uid 99); 27 Mar 2010 09:38:49 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 27 Mar 2010 09:38:49 +0000 X-ASF-Spam-Status: No, hits=-1151.4 required=10.0 tests=ALL_TRUSTED,AWL X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 27 Mar 2010 09:38:47 +0000 Received: from brutus.apache.org (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 39BE6234C4EF for ; Sat, 27 Mar 2010 09:38:27 +0000 (UTC) Message-ID: <1449769556.526531269682707235.JavaMail.jira@brutus.apache.org> Date: Sat, 27 Mar 2010 09:38:27 +0000 (UTC) From: "Michael McCandless (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Updated: (LUCENE-2111) Wrapup flexible indexing In-Reply-To: <856134490.1259883021256.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2111: --------------------------------------- Attachment: flexBench.py benchUtil.py I'm benchmarking flex vs trunk, but uncovered a strange performance loss with WildcardQuery. I'm attaching the python wrapper around contrib/benchmark that I'm using. Hopefully this is something silly... You have to edit flexBench.py, specificaly the TRUNK_DIR and FLEX_DIR must point to the .../contrib/benchmark of each source area, and you have to edit the WIKI_LINE_FILE and/or WIKI_FILE (I think WIKI_LINE_FILE can be None in which case it should (but I haven't tested recently!) fallback to parsing the .xml.bz2 wikipedia export). I'll first build an index of the first 5M wikipedia docs, once for flex and once for trunk, and then run the test queries. It also tests the "flex API on trunk index" case, to test perf of the flex emulation layer... this layer is looking a bit slowish now but I'm not sure how much we can do to speed it up... Run like this: {code} python -u flexBench.py -run test {code} I have it set to only test only the wildcard query uni*t right now... and I'm getting this result: {code} JAVA: java version "1.6.0_17" Java(TM) SE Runtime Environment (build 1.6.0_17-b04) Java HotSpot(TM) 64-Bit Server VM (build 14.3-b01, mixed mode) OS: Linux centos 2.6.18-164.6.1.el5 #1 SMP Tue Nov 3 16:12:36 EST 2009 x86_64 x86_64 x86_64 GNU/Linux Index /x/lucene/flex.work.wiki.nd5M already exists... Index /x/lucene/trunk.work.wiki.nd5M already exists... Index /x/lucene/flex.work.random.nd5M already exists... Index /x/lucene/trunk.work.random.nd5M already exists... RUN: source=wiki query=un*t sort=None run trunk... cd /root/src/clean/lucene/contrib/benchmark log: /root/src/clean/lucene/contrib/benchmark/logs/trunk.0 62.49 QPS run flex on trunk index... cd /root/src/flex.clean/contrib/benchmark log: /root/src/flex.clean/contrib/benchmark/logs/flexOnTrunk.1 25.87 QPS [-58.6% worse] run flex on flex index... cd /root/src/flex.clean/contrib/benchmark log: /root/src/flex.clean/contrib/benchmark/logs/flexOnFlex.2 39.30 QPS [-37.1% worse] 124623 hits {code} Other queries I've tested look OK so far... > Wrapup flexible indexing > ------------------------ > > Key: LUCENE-2111 > URL: https://issues.apache.org/jira/browse/LUCENE-2111 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: Flex Branch > Reporter: Michael McCandless > Assignee: Michael McCandless > Fix For: 3.1 > > Attachments: benchUtil.py, flex_backwards_merge_912395.patch, flex_merge_916543.patch, flexBench.py, LUCENE-2111-EmptyTermsEnum.patch, LUCENE-2111-EmptyTermsEnum.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111_bytesRef.patch, LUCENE-2111_experimental.patch, LUCENE-2111_fuzzy.patch, LUCENE-2111_mtqNull.patch, LUCENE-2111_mtqTest.patch, LUCENE-2111_toString.patch > > > Spinoff from LUCENE-1458. > The flex branch is in fairly good shape -- all tests pass, initial search performance testing looks good, it survived several visits from the Unicode policeman ;) > But it still has a number of nocommits, could use some more scrutiny especially on the "emulate old API on flex index" and vice/versa code paths, and still needs some more performance testing. I'll do these under this issue, and we should open separate issues for other self contained fixes. > The end is in sight! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org