lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: Can Apache Solr Handle TeraByte Large Data
Date Mon, 03 Aug 2015 18:22:45 GMT
I'd go with SolrJ personally. For a terabyte of data that (I'm inferring)
are PDF files and the like (aka "semi-structured documents) you'll
need to have Tika parse out the data you need to index. And doing
that through posting or DIH puts all the analysis on the Solr servers,
which will work, but not optimally.

Here's something to get you started:


On Mon, Aug 3, 2015 at 1:56 PM, Mugeesh Husain <> wrote:
> Hi Alexandre,
> I have a 40 millions of files which is stored in a file systems,
> the filename saved as ARIA_SSN10_0007_LOCATION_0000129.pdf
> 1.)I have to split all underscore value from a filename and these value have
> to be index to the solr.
> 2.)Do Not need file contains(Text) to index.
> You Told me "The answer is Yes" i didn't get in which way you said Yes.
> Thanks
> --
> View this message in context:
> Sent from the Solr - User mailing list archive at

View raw message