Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2CEAA18D79 for ; Sun, 9 Aug 2015 07:18:21 +0000 (UTC) Received: (qmail 20044 invoked by uid 500); 9 Aug 2015 07:18:19 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 19994 invoked by uid 500); 9 Aug 2015 07:18:19 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 19982 invoked by uid 99); 9 Aug 2015 07:18:18 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 09 Aug 2015 07:18:18 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 7249CC1263 for ; Sun, 9 Aug 2015 07:18:18 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.001 X-Spam-Level: X-Spam-Status: No, score=0.001 tagged_above=-999 required=6.31 tests=[URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id fcU-f1qN5LGr for ; Sun, 9 Aug 2015 07:18:08 +0000 (UTC) Received: from glonass.stimulussoft.com (mailarchiva.com [82.145.44.153]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 51E0831BE4 for ; Sun, 9 Aug 2015 07:18:08 +0000 (UTC) Received: by glonass.stimulussoft.com (Postfix, from userid 5001) id 571058019C2; Sun, 9 Aug 2015 08:08:32 +0100 (BST) Received: from [192.168.0.113] (105-237-217-215.access.mtnbusiness.co.za [105.237.217.215]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by glonass.stimulussoft.com (Postfix) with ESMTPSA id 3349D800301 for ; Sun, 9 Aug 2015 08:08:23 +0100 (BST) Subject: Re: Lucene TermsFilter lookup slow To: java-user@lucene.apache.org References: <55C5F6E2.2090804@stimulussoft.com> From: jamie Message-ID: <55C6FE8D.2080102@stimulussoft.com> Date: Sun, 9 Aug 2015 09:17:33 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Mike Thank you kindly for the reply. I am using Lucene v4.10.4. Are the optimization you refer to, available in this version? We haven't yet upgraded to Lucene 5 as there appear to be many API changes. Jamie On 2015/08/08 5:13 PM, Michael McCandless wrote: > Which version of Lucene are you using? Newer versions have optimized > the "primary key" use case somewhat... > > Mike McCandless > > http://blog.mikemccandless.com > > > On Sat, Aug 8, 2015 at 8:32 AM, jamie wrote: >> Greetings >> >> Our app primarily uses Lucene for its intended purpose i.e. to search across >> large amounts of unstructured text. However, recently our requirement >> expanded to perform look-ups on specific documents in the index based on >> associated custom defined unique keys. For our purposes, a unique key is the >> string representation of a 128 bit murmur hash, stored in a Lucene field >> named uid. We are currently using the TermsFilter to lookup Documents in >> the Lucene index as follows: >> >> List terms = new LinkedList<>(); >> for (String id : ids) { >> terms.add(new Term("uid", id)); >> } >> TermsFilter idFilter = new TermsFilter(terms); >> ... search logic... >> >> At any time we may need to lookup say a couple of thousand documents. Our >> problem is one of performance. On very large indexes with 30 million records >> or more, the lookup can be excruciatingly slow. At this stage, its not >> practical for us to move the data over to fit for purpose database, nor >> change the uid field to a numeric type. I fully appreciate the fact that >> Lucene is not designed to be a database, however, is there anything we can >> do to improve the performance of these look-ups? >> >> Much appreciate >> >> Jamie >> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org