Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 127F9E327 for ; Thu, 14 Mar 2013 10:18:31 +0000 (UTC) Received: (qmail 12804 invoked by uid 500); 14 Mar 2013 10:18:28 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 12751 invoked by uid 500); 14 Mar 2013 10:18:28 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 12731 invoked by uid 99); 14 Mar 2013 10:18:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Mar 2013 10:18:28 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.128.177] (HELO mail-ve0-f177.google.com) (209.85.128.177) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Mar 2013 10:18:22 +0000 Received: by mail-ve0-f177.google.com with SMTP id m1so1520946ves.8 for ; Thu, 14 Mar 2013 03:18:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type:x-gm-message-state; bh=l7efWl24apKjBHV6L9Vg5Pw7oy60CD5YGgWfGF2pCmg=; b=S7mHuAqpfME8VMV9yUQBGoSp0eNtNJf/rtwA2sE7tUqJ9TjeFM/qla1bdMwhuXUc9R 69yjhsn0XUFJ6mHqoWsjepeFadhp5tzUdhIez5hILG0j1lsTVw25ie6PVWVaVqkQ15Mf All/h0l/cggMvviKWF9yw28zSCLGCuZv7yx+jgj+kZmpsd8epMvc0SftXg2wqbOkYr/M O3HYVLZaTqilyiCZFMs4OXw+TRu0w1EbNMyBhzXcPW9xspUX7SoekcYcMGJgV395H5kq QdLKKAy6DJkutHf8LrsINpC6kK4jAPoCQQmSI0trEHVdScNoZ5oqe3IyH2vDCo53rc4A UI7w== X-Received: by 10.52.89.48 with SMTP id bl16mr676266vdb.120.1363256281820; Thu, 14 Mar 2013 03:18:01 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.190.67 with HTTP; Thu, 14 Mar 2013 03:17:40 -0700 (PDT) In-Reply-To: <1363255417.19597.77.camel@te-prime> References: <6c5dd6f8d93bc32425fc0471f2cba44f@5.135.146.229> <201303141110549848296@neusoft.com> <1363255417.19597.77.camel@te-prime> From: Michael McCandless Date: Thu, 14 Mar 2013 06:17:40 -0400 Message-ID: Subject: Re: how do I paginate Lucene search results deeply To: java-user@lucene.apache.org, te@statsbiblioteket.dk Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQlSQ3MV23Z1gSLNWJ6smjLycF1LDYY9W0p86tNN0f+KbiYRF9aInyUngEzz7u+UFMoR/RiN X-Virus-Checked: Checked by ClamAV on apache.org You could also use Lucene's "search after" capability. It's designed for exactly this use-case (deep paging). See https://issues.apache.org/jira/browse/LUCENE-2215 Mike McCandless http://blog.mikemccandless.com On Thu, Mar 14, 2013 at 6:03 AM, Toke Eskildsen wrote: > On Thu, 2013-03-14 at 04:11 +0100, dizh wrote: >> each document has a timestamp identify the time which it is indexed, I >> want search the documents using sort, the sort field is the timestamp, > > [...] > >> but when you do paging, for example in a web app , the user want to go >> to the last 49999980-5000000, well, it is slowly... > > Yes. The problen is that it performs a sliding window search with a > window size of page+topX and that does not work well with 5M entries, > especially not as it used a heap, which work very well for small windows > but horrible for large windows. > >> I have a large number of Log4J logs, and I want to index them and >> present them using web ui. > > I still don't see why you would want to page to 5M, but okay. > > Instead of representing the timestamps directly, convert them to unique > longs when indexing. Guessing that you always have less than 1000 log > entries/ms, your long would be > (timestamp_in_ms << 10) & counter++ > where the counter is set to 0 each time a different timestamp is > encountered. This also ensures that the order of your log entries is > preserved. Let's call the modified timestamps for utime. > > When you do a paginated search for 20 results, keep track of the last > utime. When you request the next page, add a NumericRangeFilter going > from the last utime (non-inclusive) with no upper limit and ask for the > top-20 results again > > > NB: Please get rid of the garbage that follows each of your posts on > this mail list. The Confidentiality Notice has negative value here. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org