lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Chen <tomchen1...@gmail.com>
Subject Re: MRIT's morphline mapper doesn't co-locate with data
Date Thu, 25 Sep 2014 13:52:44 GMT
Do you have the solr Jira number for the new ingestion tool?

Thanks

On Wed, Sep 24, 2014 at 7:57 PM, Wolfgang Hoschek <whoschek@cloudera.com>
wrote:

> Based on our measurements, Lucene indexing is so CPU intensive that it
> wouldn’t really help much to exploit data locality on read. The
> overwhelming bottleneck remains the same. Having said that, we have an
> ingestion tool in the works that will take advantage of data locality for
> splitable files as well.
>
> Wolfgang.
>
> On Sep 24, 2014, at 9:38 AM, Tom Chen <tomchen1000@gmail.com> wrote:
>
> > Hi,
> >
> > The MRIT (MapReduceIndexerTool) uses NLineInputFormat for the morphline
> > mapper. The mapper doesn't co-locate with the input data that it process.
> > Isn't this a performance hit?
> >
> > Ideally, morphline mapper should be run on those hosts that contain most
> > data blocks for the input files it process.
> >
> > Regards,
> > Tom
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message