lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: high memory usage by indexreader
Date Fri, 22 Mar 2013 09:45:33 GMT
I did ask if there was anything else relevant you'd forgotten to mention ...

How fast are general file operations on the NFS files?  Your times are
still extremely long and my guess is that your network/NFS setup are
to blame.

Can you run your code on the server that is exporting the index, if
only for comparison?

Your attachment didn't make it to the list.  In this context, any
sample code that is too big to cut and paste into an email message is
too big anyway.  If necessary cut it down to a trivial example.

But verify performance against a local index first.


--
Ian.


On Thu, Mar 21, 2013 at 10:37 PM, ash nix <nixdash@gmail.com> wrote:
> Hi Ian,
>
> Thanks for your reply.
> The index is on NFS and there is no storage local/near to machine.
> Operating system is  CentOS 6.3 with linux 2.6. It has 16 Gigs of memory.
> By initializing the Indexreader, I mean opening the IndexReader.
>
> I timed my operations using System.currentTimeMillis and executed the
> process couple of times.
> To open the IndexReader it takes 1.5 minutes at minimum and 2.5 minutes at max.
> To search a boolean AND query of 2-4 terms the search time on an
> average took 56 seconds.
>
> Apart from that I found major bottle neck in my process (updateScores call).
>
> Is the indexreader open time and search time looks  okay to you?
> My dataset is going to increase and there will be lot more documents
> with more fields.
> I am attaching the code which performs the search.
>
> Thanks,
> Ashwin
>
>
> On Thu, Mar 21, 2013 at 6:43 AM, Ian Lea <ian.lea@gmail.com> wrote:
>> That number of docs is far more than I've ever worked with but I'm
>> still surprised it takes 4 minutes to initialize an index reader.
>>
>> What exactly do you mean by initialization?  Show us the code that
>> takes 4 minutes.
>>
>> What version of lucene?  What OS?  What disks?
>>
>>
>> --
>> Ian.
>>
>>
>> On Wed, Mar 20, 2013 at 6:21 PM, ash nix <nixdash@gmail.com> wrote:
>>> Thanks Ian.
>>>
>>> Number of documents in index is 381,153,828.
>>> The data set size is 1.9TB.
>>> The index size of this dataset is 290G. It is single index.
>>> The following are the fields indexed for each of the document.
>>>
>>> 1. Document id : It is StoredField and is generally around 128 chars or more.
>>> 2. Text field:  It is TextField  and not stored.
>>> 3. Title : it is a Textfield and not stored.
>>> 4. anchor : It is Textfield and not stored.
>>> 5. Timestamp : DoubleDocValue field and not stored. Actually this
>>> should be DoubleField and I need to fix it.
>>>
>>> Initialization of indexreader at the start of search takes approximately 4 min.
>>> After initialization , I am executing a series of Boolean AND queries
>>> of 2-3 terms. Each search result is dumped with some information on
>>> score and doc id in a output file.
>>>
>>> The resident size (RES) of process is 1.7 Gigs.
>>> The total virtual memory (VIRT) is 307 Gig.
>>>
>>> Do you think this is okay?
>>> Do you think I should use Solr instead of using lucene core?
>>>
>>> I have times tamps for document and so I can split into multiple
>>> indexes sorted on chronology.
>>>
>>> Thanks,
>>> Ashwin
>>>
>>> On Wed, Mar 20, 2013 at 1:43 PM, Ian Lea <ian.lea@gmail.com> wrote:
>>>> Searching doesn't usually use that much memory, even on large indexes.
>>>>
>>>> What version of lucene are you on?  How many docs in the index?  What
>>>> does a slow query look like (q.toString()) and what search method are
>>>> you calling?  Anything else relevant you forgot to tell us?
>>>>
>>>>
>>>> Or google "lucene sharding" if you are determined to split the index.
>>>>
>>>>
>>>> --
>>>> Ian.
>>>>
>>>>
>>>> On Wed, Mar 20, 2013 at 5:12 PM, ash nix <nixdash@gmail.com> wrote:
>>>>> Hi Everybody,
>>>>>
>>>>> I have created a single compound index which is of size 250 Gigs.
>>>>> I open a single index reader to search simple boolean queries.
>>>>> The process is consuming lot of memory search painfully slow.
>>>>>
>>>>> It seems that I will have to create multiple indexes and have multiple
>>>>> index readers.
>>>>> Can anyone suggest me good blog or documentation on creating multiple
>>>>> indexes and performing parallel search.
>>>>>
>>>>> --
>>>>> Thanks,
>>>>> A
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>
>>>
>>>
>>> --
>>> Thanks,
>>> A
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
>
> --
> Thanks,
> A
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message