incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Williams <william...@gmail.com>
Subject Re: Max records
Date Fri, 03 May 2013 14:59:26 GMT
Thanks, this helps.  I'm looking into patching the BlurReducer so that
when a Row hits maxRecordsPerRow, it indexes what it can of a row - as
opposed to dropping it completely.  What's a better approach? :)

--tim

On Fri, May 3, 2013 at 10:44 AM, Aaron McCurry <amccurry@gmail.com> wrote:
> BlurTask._maxRecordCount
>
> This is used for testing, so that you can exit a mapper after N number of
> records.
>
> BlurTask._maxRecordsPerRow
>
> This will increase the number of records in a single row.  Be careful with
> this option because this may run the reducer out of memory, I have a patch
> that I can apply that removes this limit but for now it's still a risky to
> increase this too large/
>
> BlurTask._ramBufferSizeMB
>
> This is the Lucene writer buffer, large values normally increase indexing
> throughput.
>
> Aaron
>
>
> On Fri, May 3, 2013 at 10:30 AM, Tim Williams <williamstw@gmail.com> wrote:
>
>> I have an instance where I need to increase max records per row, but
>> before I do I want to understand the relationship (if there is one)
>> between:
>>
>> BlurTask._maxRecordCount
>> BlurTask._maxRecordsPerRow
>> BlurTask._ramBufferSizeMB
>>
>> I understand maxRecordsPerRow, but in looking into this found I don't
>> understand the _maxRecordCount and/or what interplay might exist with
>> buffer size.
>>
>> Thanks,
>> --tim
>>

Mime
View raw message