incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: Max records
Date Fri, 03 May 2013 15:05:43 GMT
Ok, so the better approach is to create a second new index and index the
entire row into that new small index.  Then once the row is complete, close
that new writer and index and merge it into the main index.  This allows us
to index everything and not run the reducer out of memory.


On Fri, May 3, 2013 at 10:59 AM, Tim Williams <williamstw@gmail.com> wrote:

> Thanks, this helps.  I'm looking into patching the BlurReducer so that
> when a Row hits maxRecordsPerRow, it indexes what it can of a row - as
> opposed to dropping it completely.  What's a better approach? :)
>
> --tim
>
> On Fri, May 3, 2013 at 10:44 AM, Aaron McCurry <amccurry@gmail.com> wrote:
> > BlurTask._maxRecordCount
> >
> > This is used for testing, so that you can exit a mapper after N number of
> > records.
> >
> > BlurTask._maxRecordsPerRow
> >
> > This will increase the number of records in a single row.  Be careful
> with
> > this option because this may run the reducer out of memory, I have a
> patch
> > that I can apply that removes this limit but for now it's still a risky
> to
> > increase this too large/
> >
> > BlurTask._ramBufferSizeMB
> >
> > This is the Lucene writer buffer, large values normally increase indexing
> > throughput.
> >
> > Aaron
> >
> >
> > On Fri, May 3, 2013 at 10:30 AM, Tim Williams <williamstw@gmail.com>
> wrote:
> >
> >> I have an instance where I need to increase max records per row, but
> >> before I do I want to understand the relationship (if there is one)
> >> between:
> >>
> >> BlurTask._maxRecordCount
> >> BlurTask._maxRecordsPerRow
> >> BlurTask._ramBufferSizeMB
> >>
> >> I understand maxRecordsPerRow, but in looking into this found I don't
> >> understand the _maxRecordCount and/or what interplay might exist with
> >> buffer size.
> >>
> >> Thanks,
> >> --tim
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message