lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raymond Xie <xie3208...@gmail.com>
Subject Re: How to do parallel indexing on files (not on HDFS)
Date Thu, 24 May 2018 01:36:55 GMT
Thank you Rahul despite that's very high level.

With no offense, do you have a successful implementation or it is just your
unproven idea? I never used Rabbit nor Kafka before but would be very
interested in knowing more detail on the Kafka idea as Kafka is available
in my environment.

Thank you again and look forward to hearing more from you or anyone in this
Solr community.


*------------------------------------------------*
*Sincerely yours,*


*Raymond*

On Wed, May 23, 2018 at 8:15 AM, Rahul Singh <rahul.xavier.singh@gmail.com>
wrote:

> Enumerate the file locations (map) , put them in a queue like rabbit or
> Kafka (Persist the map), have a bunch of threads , workers, containers,
> whatever pop off the queue , process the item (reduce).
>
>
> --
> Rahul Singh
> rahul.singh@anant.us
>
> Anant Corporation
>
> On May 20, 2018, 7:24 AM -0400, Raymond Xie <xie3208080@gmail.com>, wrote:
>
> I know how to do indexing on file system like single file or folder, but
> how do I do that in a parallel way? The data I need to index is of huge
> volume and can't be put on HDFS.
>
> Thank you
>
> *------------------------------------------------*
> *Sincerely yours,*
>
>
> *Raymond*
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message