lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Krupansky <jack.krupan...@gmail.com>
Subject Re: Help Indexing Large File
Date Mon, 14 Dec 2015 20:38:43 GMT
What is the nature of the file? Is it Solr XML, CSV, PDF (via Solr Cell),
or... what? If a PDF, maybe it has lots of hi-resolution images. If so, you
may need to strip out the images and just send the text, which would be a
lot smaller. For example, you could run Tika locally to extract the text
and then index the raw text.

-- Jack Krupansky

On Mon, Dec 14, 2015 at 12:04 PM, Antelmo Aguilar <Antelmo.Aguilar.17@nd.edu
> wrote:

> Hello,
>
> I am trying to index a very large file in Solr (around 5GB).  However, I
> get out of memory errors using Curl.  I tried using the post script and I
> had some success with it.  After indexing several hundred thousand records
> though, I got the following error message:
>
> *SimplePostTool: FATAL: IOException while posting data:
> java.io.IOException: too many bytes written*
>
> Would it be possible to get some help on where I can start looking to solve
> this issue?  I tried finding some type of log that would give me more
> information.  I have not had any luck.  The only logs I was able to find
> related to this error were the logs from Solr, but I assume these are from
> the "server" perspective and not "cient's" perspective of the error.  I
> would really appreciate the help.
>
> Thanks,
> Antelmo
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message