lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Indexing Files Month by Month
Date Thu, 12 Jun 2014 14:43:51 GMT
Partition your files into month-size folders and have DIH work on one
directory at a time....

What I'd do is move away from DIH and use SolrJ. That way
1> you can take full control over what you do
2> you can offload the heavy lifting of parsing the various files
    (I'm assuming here that you're indexing PDFs, Word docs, etc)
    to a bunch of clients.

Here's some code samples:http://searchhub.org/2012/02/14/indexing-with-solrj/

Or, if you really want to get wild, consider the MapReduceIndexerTool. That
requires some infrastructure though.

Best,
Erick

On Thu, Jun 12, 2014 at 7:22 AM, Venkata krishna <venkat1621@gmail.com> wrote:
> Hi ,
>
> I am using lucene solr , would like to use Data import handler for to index
> files but millions of files are there to import so indexing process will
> take more time. I decided to import files month by month,so could you please
> provide an suggestion  to import files month by month basis.
>
>
>
>
>
>
>
>
> Thanks,
>
> Venkata Krishna Tolusuri.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Indexing-Files-Month-by-Month-tp4141443.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message