lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rahul Singh <>
Subject Re: DIH with huge data
Date Thu, 12 Apr 2018 16:36:48 GMT
How much data and what is the database source? Spark is probably the fastest way.

Rahul Singh

Anant Corporation

On Apr 12, 2018, 7:28 AM -0400, Sujay Bawaskar <>, wrote:
> Hi,
> We are using DIH with SortedMapBackedCache but as data size increases we
> need to provide more heap memory to solr JVM.
> Can we use multiple CSV file instead of database queries and later data in
> CSV files can be joined using zipper? So bottom line is to create CSV files
> for each of entity in data-config.xml and join these CSV files using
> zipper.
> We also tried EHCache based DIH cache but since EHCache uses MMap IO its
> not good to use with MMapDirectoryFactory and causes to exhaust physical
> memory on machine.
> Please suggest how can we handle use case of importing huge amount of data
> into solr.
> --
> Thanks,
> Sujay P Bawaskar
> M:+91-77091 53669

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message