lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toke Eskildsen>
Subject Re: What is the best way to index 15 million documents of total size 425 GB?
Date Fri, 04 Mar 2016 07:45:48 GMT
On Fri, 2016-03-04 at 12:41 +0530, Aneesh Mon N wrote:
>    - is there any difference in posting the data in json format vs xml?
>    - do we get any performance improvement if we generate the json/xml
>    files, scp to the solr server and then push via curl command

I have not tested that, but as part of performance testing indexing, I
achieved a markedly increase in performance when I used CSV. That was
for very small documents though. I do not know how well it works for
large ones.

Standard sanity check: Have you tried piping the result from Penthao
into /dev/null, to see if it is Solr or the extraction part that is the
heavy one?

- Toke Eskildsen, State and University Library, Denmark

View raw message