incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Williams <william...@gmail.com>
Subject Re: Max records
Date Fri, 03 May 2013 19:14:36 GMT
On Fri, May 3, 2013 at 11:05 AM, Aaron McCurry <amccurry@gmail.com> wrote:
> Ok, so the better approach is to create a second new index and index the
> entire row into that new small index.  Then once the row is complete, close
> that new writer and index and merge it into the main index.  This allows us
> to index everything and not run the reducer out of memory.

So move to the temporary index approach as the way to do all the M/R
builds vs just an exception for large rows?

--tim

> On Fri, May 3, 2013 at 10:59 AM, Tim Williams <williamstw@gmail.com> wrote:
>
>> Thanks, this helps.  I'm looking into patching the BlurReducer so that
>> when a Row hits maxRecordsPerRow, it indexes what it can of a row - as
>> opposed to dropping it completely.  What's a better approach? :)
>>
>> --tim
>>
>> On Fri, May 3, 2013 at 10:44 AM, Aaron McCurry <amccurry@gmail.com> wrote:
>> > BlurTask._maxRecordCount
>> >
>> > This is used for testing, so that you can exit a mapper after N number of
>> > records.
>> >
>> > BlurTask._maxRecordsPerRow
>> >
>> > This will increase the number of records in a single row.  Be careful
>> with
>> > this option because this may run the reducer out of memory, I have a
>> patch
>> > that I can apply that removes this limit but for now it's still a risky
>> to
>> > increase this too large/
>> >
>> > BlurTask._ramBufferSizeMB
>> >
>> > This is the Lucene writer buffer, large values normally increase indexing
>> > throughput.
>> >
>> > Aaron
>> >
>> >
>> > On Fri, May 3, 2013 at 10:30 AM, Tim Williams <williamstw@gmail.com>
>> wrote:
>> >
>> >> I have an instance where I need to increase max records per row, but
>> >> before I do I want to understand the relationship (if there is one)
>> >> between:
>> >>
>> >> BlurTask._maxRecordCount
>> >> BlurTask._maxRecordsPerRow
>> >> BlurTask._ramBufferSizeMB
>> >>
>> >> I understand maxRecordsPerRow, but in looking into this found I don't
>> >> understand the _maxRecordCount and/or what interplay might exist with
>> >> buffer size.
>> >>
>> >> Thanks,
>> >> --tim
>> >>
>>

Mime
View raw message