Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0E82099E1 for ; Wed, 25 Apr 2012 21:13:57 +0000 (UTC) Received: (qmail 50595 invoked by uid 500); 25 Apr 2012 21:13:55 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 50480 invoked by uid 500); 25 Apr 2012 21:13:54 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 50472 invoked by uid 99); 25 Apr 2012 21:13:54 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Apr 2012 21:13:54 +0000 X-ASF-Spam-Status: No, hits=3.4 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HK_RANDOM_ENVFROM,HK_RANDOM_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of teddyyyy123@gmail.com designates 209.85.215.48 as permitted sender) Received: from [209.85.215.48] (HELO mail-lpp01m010-f48.google.com) (209.85.215.48) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Apr 2012 21:13:47 +0000 Received: by lagu2 with SMTP id u2so627496lag.35 for ; Wed, 25 Apr 2012 14:13:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=UpZ+Y09W/UQsOVM99kgrPGjYgHUxudWqIvQhHznfxOs=; b=Auqu7RbSgj2wQbGHzJV4Ews+k0zTYnHNuuCwSMBtmG20cADT/hTeVtlDMy+U2tnmBC a9DuD1gJ2jIIhHe9KPCF/Gf6dy5XeyrNMeBoUWlRYe2WY1rC4fvom+cZgEX5SE9br/sO CjRvNuqzot1GZuyjEH4OZGal5R3HU/dgKVjKnxpZzVB25+h9VfpSNKG7NaSXWbcffCHZ mmjqXl1WhnrBDW1sHhyzQFbgJas2oCBlz8+MrimC8xym9M6r4gV2ZRNXIXIwWiMcSxYY NTmkcov4vYI5hKev/nEXGrIBClXJlG6I4zga8/bw3iNmK38X+5l2zxd2GOyNIuPyxUxh EeTw== Received: by 10.152.110.116 with SMTP id hz20mr4081254lab.33.1335388406641; Wed, 25 Apr 2012 14:13:26 -0700 (PDT) MIME-Version: 1.0 Received: by 10.112.111.8 with HTTP; Wed, 25 Apr 2012 14:13:06 -0700 (PDT) From: Yang Date: Wed, 25 Apr 2012 14:13:06 -0700 Message-ID: Subject: lucene algorithm ? To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=bcaec54858109d74a504be875763 --bcaec54858109d74a504be875763 Content-Type: text/plain; charset=ISO-8859-1 I read the paper by Doug "Space optimizations for total ranking", since it was written a long time ago, I wonder what algorithms lucene uses (regarding postings list traversal and score calculation, ranking) particularly the total ranking algorithm described there needs to traverse down the entire postings list for all the query terms, so in case of very common query terms like "yellow dog", either of the 2 terms may have a very very long postings list in case of web search, are they all really traversed in current lucene/Solr ? or any heuristics to truncate the list are actually employed? in the case of returning top-k results, I can understand that partitioning the postings list into multiple machines, and then combining the top-k from each would work, but if we are required to return "the 100th result page", i.e. results ranked from 990--1000th, then each partition would still have to find out the top 1000, so partitioning would not help much. overall, is there any up-to-date detailed docs on the internal algorithms of lucene? Thanks a lot Yang --bcaec54858109d74a504be875763--