lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dvora <barak.ya...@gmail.com>
Subject Re: How to avoid huge index files
Date Thu, 10 Sep 2009 11:23:18 GMT

Hi again,

Can you add some details and guidelines how to implement that? Different
files types have different structure, is such spliting doable without
knowing Lucene internals?


Michael McCandless-2 wrote:
> 
> You're welcome!
> 
> Another, bottoms-up option would be to make a custom Directory impl
> that simply splits up files above a certain size.  That'd be more
> generic and more reliable...
> 
> Mike
> 
> On Thu, Sep 10, 2009 at 5:26 AM, Dvora <barak.yaish@gmail.com> wrote:
>>
>> Hi,
>>
>> Thanks a lot for that, will peforms the experiments and publish the
>> results.
>> I'm aware to the risk of peformance degredation, but for the pilot I'm
>> trying to run I think it's acceptable.
>>
>> Thanks again!
>>
>>
>>
>> Michael McCandless-2 wrote:
>>>
>>> First, you need to limit the size of segments initially created by
>>> IndexWriter due to newly added documents.  Probably the simplest way
>>> is to call IndexWriter.commit() frequently enough.  You might want to
>>> use IndexWriter.ramSizeInBytes() to gauge how much RAM is currently
>>> consumed by IndexWriter's buffer to determine when to commit.  But it
>>> won't be an exact science, ie, the segment size will be different from
>>> the RAM buffer size.  So, experiment w/ it...
>>>
>>> Second, you need to prevent merging from creating a segment that's too
>>> large.  For this I would use the setMaxMergeMB method of the
>>> LogByteSizeMergePolicy (which is IndexWriter's default merge policy).
>>> But note that this max size applies to the *input* segments, so you'd
>>> roughly want that to be 1.0 MB (your 10.0 MB divided by the merge
>>> factor = 10), but probably make it smaller to be sure things stay
>>> small enough.
>>>
>>> Note that with this approach, if your index is large enough, you'll
>>> wind up with many segments and search performance will suffer when
>>> compared to an index that doesn't have this max 10.0 MB file size
>>> restriction.
>>>
>>> Mike
>>>
>>> On Thu, Sep 10, 2009 at 2:32 AM, Dvora <barak.yaish@gmail.com> wrote:
>>>>
>>>> Hello again,
>>>>
>>>> Can someone please comment on that, whether what I'm looking is
>>>> possible
>>>> or
>>>> not?
>>>>
>>>>
>>>> Dvora wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> I'm using Lucene2.4. I'm developing a web application that using
>>>>> Lucene
>>>>> (via compass) to do the searches.
>>>>> I'm intending to deploy the application in Google App Engine
>>>>> (http://code.google.com/appengine/), which limits files length to be
>>>>> smaller than 10MB. I've read about the various policies supported by
>>>>> Lucene to limit the file sizes, but on matter which policy I used and
>>>>> which parameters, the index files still grew to be lot more the 10MB.
>>>>> Looking at the code, I've managed to limit the cfs files (predicting
>>>>> the
>>>>> file size in CompoundFileWriter before closing the file) - I guess
>>>>> that
>>>>> will degrade performance, but it's OK for now. But now the FDT files
>>>>> are
>>>>> becoming huge (about 60MB) and I cant identifiy a way to limit those
>>>>> files.
>>>>>
>>>>> Is there some built-in and correct way to limit these files length? If
>>>>> no,
>>>>> can someone direct me please how should I tweak the source code to
>>>>> achieve
>>>>> that?
>>>>>
>>>>> Thanks for any help.
>>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/How-to-avoid-huge-index-files-tp25347505p25378056.html
>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/How-to-avoid-huge-index-files-tp25347505p25380052.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/How-to-avoid-huge-index-files-tp25347505p25381489.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message