Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (athena.apache.org: domain of ian.lea@gmail.com designates
 209.85.210.175 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CALDE+61v6DB-g5+5hEMZT0W_Mgkm4LGgzOK2W3oez-XEZ2Hb1g@mail.gmail.com>
References: 
 <CALDE+60bFz_aavvmO2CO-mvEw+b4pQg_CHMeQ=3AyHBCcHXxpw@mail.gmail.com>
 <CAEY5pxW68M3KBvjrT3UY-sFryrTNkOcapLbqvkxoJ+U+RjPghQ@mail.gmail.com>
 <CALDE+61v6DB-g5+5hEMZT0W_Mgkm4LGgzOK2W3oez-XEZ2Hb1g@mail.gmail.com>
From: Ian Lea <ian.lea@gmail.com>
Date: Thu, 21 Mar 2013 10:43:54 +0000
Message-ID: 
 <CAEY5pxXyrL+94aCvEnyPz2p5cf==N96qLGzdjO30k2v_AN=Jog@mail.gmail.com>
Subject: Re: high memory usage by indexreader
To: java-user@lucene.apache.org
Content-Type: text/plain; charset=ISO-8859-1

That number of docs is far more than I've ever worked with but I'm
still surprised it takes 4 minutes to initialize an index reader.

What exactly do you mean by initialization?  Show us the code that
takes 4 minutes.

What version of lucene?  What OS?  What disks?


--
Ian.


On Wed, Mar 20, 2013 at 6:21 PM, ash nix <nixdash@gmail.com> wrote:
> Thanks Ian.
>
> Number of documents in index is 381,153,828.
> The data set size is 1.9TB.
> The index size of this dataset is 290G. It is single index.
> The following are the fields indexed for each of the document.
>
> 1. Document id : It is StoredField and is generally around 128 chars or more.
> 2. Text field:  It is TextField  and not stored.
> 3. Title : it is a Textfield and not stored.
> 4. anchor : It is Textfield and not stored.
> 5. Timestamp : DoubleDocValue field and not stored. Actually this
> should be DoubleField and I need to fix it.
>
> Initialization of indexreader at the start of search takes approximately 4 min.
> After initialization , I am executing a series of Boolean AND queries
> of 2-3 terms. Each search result is dumped with some information on
> score and doc id in a output file.
>
> The resident size (RES) of process is 1.7 Gigs.
> The total virtual memory (VIRT) is 307 Gig.
>
> Do you think this is okay?
> Do you think I should use Solr instead of using lucene core?
>
> I have times tamps for document and so I can split into multiple
> indexes sorted on chronology.
>
> Thanks,
> Ashwin
>
> On Wed, Mar 20, 2013 at 1:43 PM, Ian Lea <ian.lea@gmail.com> wrote:
>> Searching doesn't usually use that much memory, even on large indexes.
>>
>> What version of lucene are you on?  How many docs in the index?  What
>> does a slow query look like (q.toString()) and what search method are
>> you calling?  Anything else relevant you forgot to tell us?
>>
>>
>> Or google "lucene sharding" if you are determined to split the index.
>>
>>
>> --
>> Ian.
>>
>>
>> On Wed, Mar 20, 2013 at 5:12 PM, ash nix <nixdash@gmail.com> wrote:
>>> Hi Everybody,
>>>
>>> I have created a single compound index which is of size 250 Gigs.
>>> I open a single index reader to search simple boolean queries.
>>> The process is consuming lot of memory search painfully slow.
>>>
>>> It seems that I will have to create multiple indexes and have multiple
>>> index readers.
>>> Can anyone suggest me good blog or documentation on creating multiple
>>> indexes and performing parallel search.
>>>
>>> --
>>> Thanks,
>>> A
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
>
> --
> Thanks,
> A
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org