lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Rutherglen <jason.rutherg...@gmail.com>
Subject Re: Max Segmentation Size when Optimizing Index
Date Wed, 13 Jan 2010 23:08:23 GMT
Right... It all blends together, I need an NLP analyzer for my emails

On Wed, Jan 13, 2010 at 3:05 PM, Otis Gospodnetic
<otis_gospodnetic@yahoo.com> wrote:
> I think Jason meant "15-20GB segments"?
>  Otis
> --
> Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
>
>
>
>
> ________________________________
> From: Jason Rutherglen <jason.rutherglen@gmail.com>
> To: java-user@lucene.apache.org
> Sent: Wed, January 13, 2010 5:54:38 PM
> Subject: Re: Max Segmentation Size when Optimizing Index
>
> Yes... You could hack LogMergePolicy to do something else.
>
> I use optimise(numsegments:5) regularly on 80GB indexes, that if
> optimized to 1 segment, would thrash the IO excessively.  This works
> fine because 15-20GB indexes are plenty large and fast.
>
> On Wed, Jan 13, 2010 at 2:44 PM, Trin Chavalittumrong <mrtrin@gmail.com> wrote:
>> Seems like optimize() only cares about final number of segments rather than
>> the size of the segment. Is it so?
>>
>> On Wed, Jan 13, 2010 at 2:35 PM, Jason Rutherglen <
>> jason.rutherglen@gmail.com> wrote:
>>
>>> There's a different method in LogMergePolicy that performs the
>>> optimize... Right, so normal merging uses the findMerges method, then
>>> there's a findMergeOptimize (method names could be inaccurate).
>>>
>>> On Wed, Jan 13, 2010 at 2:29 PM, Trin Chavalittumrong <mrtrin@gmail.com>
>>> wrote:
>>> > Do you mean MergePolicy is only used during index time and will be
>>> ignored
>>> > by by the Optimize() process?
>>> >
>>> >
>>> > On Wed, Jan 13, 2010 at 1:57 PM, Jason Rutherglen <
>>> > jason.rutherglen@gmail.com> wrote:
>>> >
>>> >> Oh ok, you're asking about optimizing... I think that's a different
>>> >> algorithm inside LogMergePolicy.  I think it ignores the maxMergeMB
>>> >> param.
>>> >>
>>> >> On Wed, Jan 13, 2010 at 1:49 PM, Trin Chavalittumrong <mrtrin@gmail.com
>>> >
>>> >> wrote:
>>> >> > Thanks, Jason.
>>> >> >
>>> >> > Is my understanding correct that
>>> >> LogByteSizeMergePolicy.setMaxMergeMB(100)
>>> >> > will prevent
>>> >> > merging of two segments that is larger than 100 Mb each at the
>>> optimizing
>>> >> > time?
>>> >> >
>>> >> > If so, why do think would I still see segment that is larger than
200
>>> MB?
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Wed, Jan 13, 2010 at 1:43 PM, Jason Rutherglen <
>>> >> > jason.rutherglen@gmail.com> wrote:
>>> >> >
>>> >> >> Hi Trin,
>>> >> >>
>>> >> >> There was recently a discussion about this, the max size is
>>> >> >> for the before merge segments, rather than the resultant merged
>>> >> >> segment (if that makes sense). It'd be great if we had a merge
>>> >> >> policy that limited the resultant merged segment, though that'd
>>> >> >> by a rough approximation at best.
>>> >> >>
>>> >> >> Jason
>>> >> >>
>>> >> >> On Wed, Jan 13, 2010 at 1:36 PM, Trin Chavalittumrong <
>>> mrtrin@gmail.com
>>> >> >
>>> >> >> wrote:
>>> >> >> > Hi,
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > I am trying to optimize the index which would merge different
>>> segment
>>> >> >> > together. Let say the index folder is 1Gb in total, I
need each
>>> >> >> segmentation
>>> >> >> > to be no larger than 200Mb. I tried to use *LogByteSizeMergePolicy
>>> >> *and
>>> >> >> > setMaxMergeMB(100) to ensure no segment after merging
would be
>>> 200Mb.
>>> >> >> > However, I still see segment that are larger than 200Mb.
I did call
>>> >> >> > IndexWriter.optimize(20) to make sure there are enough
number
>>> >> >> segmentation
>>> >> >> > to allow each segment to be under 200Mb.
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > Can someone let me know if I am using this right? Or any
suggestion
>>> on
>>> >> >> how
>>> >> >> > to tackle this would be helpful.
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > Thanks,
>>> >> >> >
>>> >> >> > Trin
>>> >> >> >
>>> >> >>
>>> >> >> ---------------------------------------------------------------------
>>> >> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> >> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>>> >> >>
>>> >> >>
>>> >> >
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>>> >>
>>> >>
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message