lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikhail Khludnev <m...@apache.org>
Subject Re: Performance of Bulk Importing TSV File in Solr 8
Date Thu, 02 Jan 2020 21:07:41 GMT
Hello, Joseph.

This rate looks good to me, although if the node is idling and  has a
plenty of free RAM, you can dissect this file by unix tools and submit
these partitions for import in parallel.
Hanging connection seems like a bug.

On Thu, Jan 2, 2020 at 10:09 PM Joseph Lorenzini <jaloren@gmail.com> wrote:

> Hi all,
>
> I have TSV file that contains 1.2 million rows. I want to bulk import this
> file into solr where each row becomes a solr document. The TSV has 24
> columns. I am using the streaming API like so:
>
> curl -v '
>
> http://localhost:8983/solr/example/update?stream.file=/opt/solr/results.tsv&separator=%09&escape=%5c&stream.contentType=text/csv;charset=utf-8&commit=true
> '
>
> The ingestion rate is 167,000 rows a minute and takes about 7.5 minutes to
> complete. I have a few questions.
>
> - is there a way to increase the performance of the ingestion rate? I am
> open to doing something other than bulk import of a TSV up to and including
> writing a small program. I am just not sure what that would look like at a
> high level.
> - if the file is a TSV, I noticed that solr never closes a HTTP connection
> with a 200 OK after all the documents are uploaded. The connection seems to
> be held open indefinitely. If however, i upload the same file as a CSV,
> then solr does close the http connection. Is this a bug?
>


-- 
Sincerely yours
Mikhail Khludnev

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message