Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 59356 invoked from network); 7 May 2009 21:15:09 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 7 May 2009 21:15:09 -0000 Received: (qmail 25075 invoked by uid 500); 7 May 2009 21:15:08 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 25008 invoked by uid 500); 7 May 2009 21:15:08 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 25000 invoked by uid 99); 7 May 2009 21:15:07 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 May 2009 21:15:07 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 May 2009 21:15:05 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 90646234C041 for ; Thu, 7 May 2009 14:14:45 -0700 (PDT) Message-ID: <993035678.1241730885575.JavaMail.jira@brutus> Date: Thu, 7 May 2009 14:14:45 -0700 (PDT) From: "Eks Dev (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-1594) Use source code specialization to maximize search performance In-Reply-To: <859548072.1239382694915.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707116#action_12707116 ] Eks Dev commented on LUCENE-1594: --------------------------------- huh, it reduces hardware costs 2-3 times for larger setup! great > Use source code specialization to maximize search performance > ------------------------------------------------------------- > > Key: LUCENE-1594 > URL: https://issues.apache.org/jira/browse/LUCENE-1594 > Project: Lucene - Java > Issue Type: New Feature > Components: Search > Reporter: Michael McCandless > Assignee: Michael McCandless > Priority: Minor > Attachments: FastSearchTask.java, LUCENE-1594.patch, LUCENE-1594.patch, LUCENE-1594.patch > > > Towards eeking absolute best search performance, and after seeing the > Java ghosts in LUCENE-1575, I decided to build a simple prototype > source code specializer for Lucene's searches. > The idea is to write dynamic Java code, specialized to run a very > specific query context (eg TermQuery, collecting top N by field, no > filter, no deletions), compile that Java code, and run it. > Here're the performance gains when compared to trunk: > ||Query||Sort||Filt|Deletes||Scoring||Hits||QPS (base)||QPS (new)||%|| > |1|Date (long)|no|no|Track,Max|2561886|6.8|10.6|{color:green}55.9%{color}| > |1|Date (long)|no|5%|Track,Max|2433472|6.3|10.5|{color:green}66.7%{color}| > |1|Date (long)|25%|no|Track,Max|640022|5.2|9.9|{color:green}90.4%{color}| > |1|Date (long)|25%|5%|Track,Max|607949|5.3|10.3|{color:green}94.3%{color}| > |1|Date (long)|10%|no|Track,Max|256300|6.7|12.3|{color:green}83.6%{color}| > |1|Date (long)|10%|5%|Track,Max|243317|6.6|12.6|{color:green}90.9%{color}| > |1|Relevance|no|no|Track,Max|2561886|11.2|17.3|{color:green}54.5%{color}| > |1|Relevance|no|5%|Track,Max|2433472|10.1|15.7|{color:green}55.4%{color}| > |1|Relevance|25%|no|Track,Max|640022|6.1|14.1|{color:green}131.1%{color}| > |1|Relevance|25%|5%|Track,Max|607949|6.2|14.4|{color:green}132.3%{color}| > |1|Relevance|10%|no|Track,Max|256300|7.7|15.6|{color:green}102.6%{color}| > |1|Relevance|10%|5%|Track,Max|243317|7.6|15.9|{color:green}109.2%{color}| > |1|Title (string)|no|no|Track,Max|2561886|7.8|12.5|{color:green}60.3%{color}| > |1|Title (string)|no|5%|Track,Max|2433472|7.5|11.1|{color:green}48.0%{color}| > |1|Title (string)|25%|no|Track,Max|640022|5.7|11.2|{color:green}96.5%{color}| > |1|Title (string)|25%|5%|Track,Max|607949|5.5|11.3|{color:green}105.5%{color}| > |1|Title (string)|10%|no|Track,Max|256300|7.0|12.7|{color:green}81.4%{color}| > |1|Title (string)|10%|5%|Track,Max|243317|6.7|13.2|{color:green}97.0%{color}| > Those tests were run on a 19M doc wikipedia index (splitting each > Wikipedia doc @ ~1024 chars), on Linux, Java 1.6.0_10 > But: it only works with TermQuery for now; it's just a start. > It should be easy for others to run this test: > * apply patch > * cd contrib/benchmark > * run python -u bench.py -delindex > -nodelindex > (You can leave off one of -delindex or -nodelindex and it'll skip > those tests). > For each test, bench.py generates a single Java source file that runs > that one query; you can open > contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask/tasks/FastSearchTask.java > to see it. I'll attach an example. It writes "results.txt", in Jira > table format, which you should be able to copy/paste back here. > The specializer uses pretty much every search speedup I can think of > -- the ones from LUCENE-1575 (to score or not, to maxScore or not), > the ones suggested in the spinoff LUCENE-1593 (pre-fill w/ sentinels, > don't use docID for tie breaking), LUCENE-1536 (random access > filters). It bypasses TermDocs and interacts directly with the > IndexInput, and with BitVector for deletions. It directly folds in > the collector, if possible. A filter if used must be random access, > and is assumed to pre-multiply-in the deleted docs. > Current status: > * I only handle TermQuery. I'd like to add others over time... > * It can collect by score, or single field (with the 3 scoring > options in LUCENE-1575). It can't do reverse field sort nor > multi-field sort now. > * The auto-gen code (gen.py) is rather hideous. It could use some > serious refactoring, etc.; I think we could get it to the point > where each Query can gen its own specialized code, maybe. It also > needs to be eventually ported to Java. > * The script runs old, then new, then checks that the topN results > are identical, and aborts if not. So I'm pretty sure the > specialized code is working correctly, for the cases I'm testing. > * The patch includes a few small changes to core, mostly to open up > package protected APIs so I can access stuff > I think this is an interesting effort for several reasons: > * It gives us a best-case upper bound performance we can expect from > Lucene's normal search classes (minus algorithmic improvements eg > PFOR) because it makes life as easy as possible on the > compiler/JRE to convert to assembly. > * We can spin out optimization ideas from this back into the core > (eg LUCENE-1593 already has one example), and prioritize. EG I > think given these results, optimizing for filters that support > random-access API is important. As we fold speedups back into > core, the gains from specialization will naturally decrease. > * Eventually (maybe, eg as a future "experimental" module) this can > be used in production as a simple "search wrapper". Ie, for a > given query, the specializer is checked. If the query "matches" > what the specializer can handle, then the specialized code is run; > else we fallback to Lucene core. Likely one would pre-compile the > space of all specializations, or we could compile java-on-the-fly > (eg what a JSP source does when it's changed) but I'm not sure how > costly/portable that is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org