Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 52911F7B0 for ; Thu, 21 Mar 2013 10:44:43 +0000 (UTC) Received: (qmail 54209 invoked by uid 500); 21 Mar 2013 10:44:41 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 54146 invoked by uid 500); 21 Mar 2013 10:44:41 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 54127 invoked by uid 99); 21 Mar 2013 10:44:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Mar 2013 10:44:40 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ian.lea@gmail.com designates 209.85.210.175 as permitted sender) Received: from [209.85.210.175] (HELO mail-ia0-f175.google.com) (209.85.210.175) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Mar 2013 10:44:35 +0000 Received: by mail-ia0-f175.google.com with SMTP id y26so2240503iab.6 for ; Thu, 21 Mar 2013 03:44:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=tPpKQ472lW6rL0la9OtH4qDuJpQ6jW3qQweMwOCgm4k=; b=I4A6/iXgu4MP35RFvHsaoPOh8nGunqCKZpBWSUxYRRPZnhn7gwuw52md0wRiF9qh/j ZBJ/zII/xnlHBDkSgoRBKNrT+ygNLt8XLj3eyKDIrxtv4Ioa6cqPvQluoSLhIEODPB95 /+n9wVQQapJXYcJqVDq7cWx4KjsUfS78ogfnoFXir438pmbT2TPfuqK/ntV+VIiSStSQ E2KA2B79YgxkXVlMWB0TqcFkMm33QKcgChSLYwyEjQhf+eICVyBmxFJTQlp3aVVFJKFU 1SXGf2gTlIO6j5Y+/g7CoPMFR7bRto76D9TwUyFFgcSsn2RP4A1vl1RA5Oc/zj7xE/Q1 PpSw== X-Received: by 10.50.89.200 with SMTP id bq8mr1797225igb.58.1363862654967; Thu, 21 Mar 2013 03:44:14 -0700 (PDT) MIME-Version: 1.0 Received: by 10.50.203.42 with HTTP; Thu, 21 Mar 2013 03:43:54 -0700 (PDT) In-Reply-To: References: From: Ian Lea Date: Thu, 21 Mar 2013 10:43:54 +0000 Message-ID: Subject: Re: high memory usage by indexreader To: java-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org That number of docs is far more than I've ever worked with but I'm still surprised it takes 4 minutes to initialize an index reader. What exactly do you mean by initialization? Show us the code that takes 4 minutes. What version of lucene? What OS? What disks? -- Ian. On Wed, Mar 20, 2013 at 6:21 PM, ash nix wrote: > Thanks Ian. > > Number of documents in index is 381,153,828. > The data set size is 1.9TB. > The index size of this dataset is 290G. It is single index. > The following are the fields indexed for each of the document. > > 1. Document id : It is StoredField and is generally around 128 chars or more. > 2. Text field: It is TextField and not stored. > 3. Title : it is a Textfield and not stored. > 4. anchor : It is Textfield and not stored. > 5. Timestamp : DoubleDocValue field and not stored. Actually this > should be DoubleField and I need to fix it. > > Initialization of indexreader at the start of search takes approximately 4 min. > After initialization , I am executing a series of Boolean AND queries > of 2-3 terms. Each search result is dumped with some information on > score and doc id in a output file. > > The resident size (RES) of process is 1.7 Gigs. > The total virtual memory (VIRT) is 307 Gig. > > Do you think this is okay? > Do you think I should use Solr instead of using lucene core? > > I have times tamps for document and so I can split into multiple > indexes sorted on chronology. > > Thanks, > Ashwin > > On Wed, Mar 20, 2013 at 1:43 PM, Ian Lea wrote: >> Searching doesn't usually use that much memory, even on large indexes. >> >> What version of lucene are you on? How many docs in the index? What >> does a slow query look like (q.toString()) and what search method are >> you calling? Anything else relevant you forgot to tell us? >> >> >> Or google "lucene sharding" if you are determined to split the index. >> >> >> -- >> Ian. >> >> >> On Wed, Mar 20, 2013 at 5:12 PM, ash nix wrote: >>> Hi Everybody, >>> >>> I have created a single compound index which is of size 250 Gigs. >>> I open a single index reader to search simple boolean queries. >>> The process is consuming lot of memory search painfully slow. >>> >>> It seems that I will have to create multiple indexes and have multiple >>> index readers. >>> Can anyone suggest me good blog or documentation on creating multiple >>> indexes and performing parallel search. >>> >>> -- >>> Thanks, >>> A >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>> For additional commands, e-mail: java-user-help@lucene.apache.org >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> > > > > -- > Thanks, > A > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org