lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rafalovitch <arafa...@gmail.com>
Subject Re: Indexing (posting document) taking a lot of time
Date Tue, 16 Aug 2016 23:11:15 GMT
What format are those documents? Solr XML? Custom JSON?

Or are you sending PDF/binary documents to Solr's extract handler and
asking it to do the extraction of the useful stuff? If later, you
could take that step out of Solr with a custom client using Tika (what
Solr has under the hood) and only send to Solr the processed output.

Regards,
   Alex.
----
Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 16 August 2016 at 22:49, kshitij tyagi <kshitij.shopclues@gmail.com> wrote:
> 400kb is size of single document and i am sending 100 documents per request.
> solr heap size is 16gb and running on multithread.
>
> On Tue, Aug 16, 2016 at 5:10 PM, Emir Arnautovic <
> emir.arnautovic@sematext.com> wrote:
>
>> Hi,
>>
>> 400KB/doc * 100doc = 40MB. If you are running it single threaded, Solr
>> will be idle while accepting relatively large request. Or is 400KB 100 doc
>> bulk that you are sending?
>>
>> What is Solr's heap size? I would try increasing number of threads and
>> monitor Solr's heap/CPU/IO to see where is the bottleneck.
>>
>> How complex is fields' analysis?
>>
>> Regards,
>> Emir
>>
>>
>> On 16.08.2016 13:25, kshitij tyagi wrote:
>>
>>> hi,
>>>
>>> we are sending about 100 documents per request for indexing? we have
>>> autocmmit set to false and commit only when 10000 documents are
>>> present.solr and the machine sending request are in same pool.
>>>
>>>
>>>
>>> On Tue, Aug 16, 2016 at 4:51 PM, Emir Arnautovic <
>>> emir.arnautovic@sematext.com> wrote:
>>>
>>> Hi,
>>>>
>>>> Do you send one doc per request? How frequently do you commit? Where is
>>>> Solr running? What is network connection between your machine and Solr?
>>>> What are JVM settings? Is 10-30s for entire indexing or single doc?
>>>>
>>>> Regards,
>>>> Emir
>>>>
>>>>
>>>> On 16.08.2016 11:34, kshitij tyagi wrote:
>>>>
>>>> Hi alexandre,
>>>>>
>>>>> 1 document of 400kb size is taking approx 10-30 sec and this is
>>>>> varying. I
>>>>> am posting document using curl
>>>>>
>>>>> On Tue, Aug 16, 2016 at 2:11 PM, Alexandre Rafalovitch <
>>>>> arafalov@gmail.com>
>>>>> wrote:
>>>>>
>>>>> How many records is that and what is 'slow'? Also is this standalone
or
>>>>>
>>>>>> cluster setup?
>>>>>>
>>>>>> On 16 Aug 2016 6:33 PM, "kshitij tyagi" <kshitij.shopclues@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>> I am indexing a lot of data about 8GB, but it is taking a lot
of
>>>>>>> time. I
>>>>>>> have read about maxBufferedDocs, ramBufferSizeMB, merge policy
,etc in
>>>>>>> solrconfig file.
>>>>>>>
>>>>>>> It would be helpful if someone could help me out tune the segtting
for
>>>>>>> faster indexing speeds.
>>>>>>>
>>>>>>> *I have read the docs but not able to get what exactly means
changing
>>>>>>>
>>>>>>> these
>>>>>>
>>>>>> configs.*
>>>>>>>
>>>>>>>
>>>>>>> *Regards,*
>>>>>>> *Kshitij*
>>>>>>>
>>>>>>>
>>>>>>> --
>>>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>>>> Solr & Elasticsearch Support * http://sematext.com/
>>>>
>>>>
>>>>
>> --
>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>

Mime
View raw message