lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dvora <barak.ya...@gmail.com>
Subject Re: How to avoid huge index files
Date Thu, 10 Sep 2009 09:26:47 GMT

Hi,

Thanks a lot for that, will peforms the experiments and publish the results.
I'm aware to the risk of peformance degredation, but for the pilot I'm
trying to run I think it's acceptable.

Thanks again!



Michael McCandless-2 wrote:
> 
> First, you need to limit the size of segments initially created by
> IndexWriter due to newly added documents.  Probably the simplest way
> is to call IndexWriter.commit() frequently enough.  You might want to
> use IndexWriter.ramSizeInBytes() to gauge how much RAM is currently
> consumed by IndexWriter's buffer to determine when to commit.  But it
> won't be an exact science, ie, the segment size will be different from
> the RAM buffer size.  So, experiment w/ it...
> 
> Second, you need to prevent merging from creating a segment that's too
> large.  For this I would use the setMaxMergeMB method of the
> LogByteSizeMergePolicy (which is IndexWriter's default merge policy).
> But note that this max size applies to the *input* segments, so you'd
> roughly want that to be 1.0 MB (your 10.0 MB divided by the merge
> factor = 10), but probably make it smaller to be sure things stay
> small enough.
> 
> Note that with this approach, if your index is large enough, you'll
> wind up with many segments and search performance will suffer when
> compared to an index that doesn't have this max 10.0 MB file size
> restriction.
> 
> Mike
> 
> On Thu, Sep 10, 2009 at 2:32 AM, Dvora <barak.yaish@gmail.com> wrote:
>>
>> Hello again,
>>
>> Can someone please comment on that, whether what I'm looking is possible
>> or
>> not?
>>
>>
>> Dvora wrote:
>>>
>>> Hello,
>>>
>>> I'm using Lucene2.4. I'm developing a web application that using Lucene
>>> (via compass) to do the searches.
>>> I'm intending to deploy the application in Google App Engine
>>> (http://code.google.com/appengine/), which limits files length to be
>>> smaller than 10MB. I've read about the various policies supported by
>>> Lucene to limit the file sizes, but on matter which policy I used and
>>> which parameters, the index files still grew to be lot more the 10MB.
>>> Looking at the code, I've managed to limit the cfs files (predicting the
>>> file size in CompoundFileWriter before closing the file) - I guess that
>>> will degrade performance, but it's OK for now. But now the FDT files are
>>> becoming huge (about 60MB) and I cant identifiy a way to limit those
>>> files.
>>>
>>> Is there some built-in and correct way to limit these files length? If
>>> no,
>>> can someone direct me please how should I tweak the source code to
>>> achieve
>>> that?
>>>
>>> Thanks for any help.
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/How-to-avoid-huge-index-files-tp25347505p25378056.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/How-to-avoid-huge-index-files-tp25347505p25380052.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message