Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 46106 invoked from network); 2 Apr 2010 19:34:49 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 2 Apr 2010 19:34:49 -0000 Received: (qmail 24481 invoked by uid 500); 2 Apr 2010 19:34:48 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 24431 invoked by uid 500); 2 Apr 2010 19:34:48 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 24424 invoked by uid 99); 2 Apr 2010 19:34:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Apr 2010 19:34:48 +0000 X-ASF-Spam-Status: No, hits=-1207.1 required=10.0 tests=ALL_TRUSTED,AWL X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Apr 2010 19:34:47 +0000 Received: from brutus.apache.org (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 792C6234C1EF for ; Fri, 2 Apr 2010 19:34:27 +0000 (UTC) Message-ID: <1731538216.664951270236867495.JavaMail.jira@brutus.apache.org> Date: Fri, 2 Apr 2010 19:34:27 +0000 (UTC) From: "Sergey Vladimirov (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Created: (LUCENE-2362) Add support for slow filters with batch processing MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Add support for slow filters with batch processing -------------------------------------------------- Key: LUCENE-2362 URL: https://issues.apache.org/jira/browse/LUCENE-2362 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 3.0.1 Reporter: Sergey Vladimirov Internal implementation of IndexSearch assumes that Filter and scorer has almost equal perfomance. But in our environment we have Filter implementation that is very expensive (in compare to scorer). if we have, let's say, 2k of termdocs selected by scorer (each ~250 docs) and 2k selected by filter, then 250k docs will be fastly checked (and filtered out) by scorer, and 250k docs will be slowly checked by our filter. Using straigthforward implementation makes search out of 60 seconds per query boundary, because each next() or advance() requires N queries to database PER CHECKED DOC. Using read ahead technique allows us to optimze it to 35 seconds per query. Still too slow. The solution to problem is firstly select all documents by scorer and filter them in batch by our filter. Example of implementation (with BitSet) in attachement. Currently it takes only ~300 millseconds per query. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org