lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: How to avoid huge index files
Date Thu, 10 Sep 2009 10:42:30 GMT
You're welcome!

Another, bottoms-up option would be to make a custom Directory impl
that simply splits up files above a certain size.  That'd be more
generic and more reliable...

Mike

On Thu, Sep 10, 2009 at 5:26 AM, Dvora <barak.yaish@gmail.com> wrote:
>
> Hi,
>
> Thanks a lot for that, will peforms the experiments and publish the results.
> I'm aware to the risk of peformance degredation, but for the pilot I'm
> trying to run I think it's acceptable.
>
> Thanks again!
>
>
>
> Michael McCandless-2 wrote:
>>
>> First, you need to limit the size of segments initially created by
>> IndexWriter due to newly added documents.  Probably the simplest way
>> is to call IndexWriter.commit() frequently enough.  You might want to
>> use IndexWriter.ramSizeInBytes() to gauge how much RAM is currently
>> consumed by IndexWriter's buffer to determine when to commit.  But it
>> won't be an exact science, ie, the segment size will be different from
>> the RAM buffer size.  So, experiment w/ it...
>>
>> Second, you need to prevent merging from creating a segment that's too
>> large.  For this I would use the setMaxMergeMB method of the
>> LogByteSizeMergePolicy (which is IndexWriter's default merge policy).
>> But note that this max size applies to the *input* segments, so you'd
>> roughly want that to be 1.0 MB (your 10.0 MB divided by the merge
>> factor = 10), but probably make it smaller to be sure things stay
>> small enough.
>>
>> Note that with this approach, if your index is large enough, you'll
>> wind up with many segments and search performance will suffer when
>> compared to an index that doesn't have this max 10.0 MB file size
>> restriction.
>>
>> Mike
>>
>> On Thu, Sep 10, 2009 at 2:32 AM, Dvora <barak.yaish@gmail.com> wrote:
>>>
>>> Hello again,
>>>
>>> Can someone please comment on that, whether what I'm looking is possible
>>> or
>>> not?
>>>
>>>
>>> Dvora wrote:
>>>>
>>>> Hello,
>>>>
>>>> I'm using Lucene2.4. I'm developing a web application that using Lucene
>>>> (via compass) to do the searches.
>>>> I'm intending to deploy the application in Google App Engine
>>>> (http://code.google.com/appengine/), which limits files length to be
>>>> smaller than 10MB. I've read about the various policies supported by
>>>> Lucene to limit the file sizes, but on matter which policy I used and
>>>> which parameters, the index files still grew to be lot more the 10MB.
>>>> Looking at the code, I've managed to limit the cfs files (predicting the
>>>> file size in CompoundFileWriter before closing the file) - I guess that
>>>> will degrade performance, but it's OK for now. But now the FDT files are
>>>> becoming huge (about 60MB) and I cant identifiy a way to limit those
>>>> files.
>>>>
>>>> Is there some built-in and correct way to limit these files length? If
>>>> no,
>>>> can someone direct me please how should I tweak the source code to
>>>> achieve
>>>> that?
>>>>
>>>> Thanks for any help.
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/How-to-avoid-huge-index-files-tp25347505p25378056.html
>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/How-to-avoid-huge-index-files-tp25347505p25380052.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message