lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Data Import Handler takes different time on different machines
Date Tue, 02 Feb 2016 04:03:42 GMT
The first thing I'd be looking at is how I the JDBC batch size compares
between the two machines.....

AFAIK, Solr shouldn't notice the difference, and since a large majority
of the development is done on Linux-based systems, I'd be surprised if
this was worse than Windows, which would lead me to the one thing that
is definitely different between the two: Your JDBC driver and its settings.
At least that's where I'd look first.

If nothing immediate pops up, I'd probably write a small driver program to
just access the database from the two machines and process your 10M
records _without_ sending them to Solr and see what the comparison is.

You can also forgo DIH and do a simple import program via SolrJ. The
advantage here is that the comparison I'm talking about above is
really simple, just comment out the call that sends data to Solr. Here's an
example...

https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/

Best,
Erick

On Mon, Feb 1, 2016 at 7:34 PM, Troy Edwards <tedwards415107@gmail.com> wrote:
> Sorry, I should explain further. The Data Import Handler had been running
> for a while retrieving only about 150000 records from the database. Both in
> development env (windows) and linux machine it took about 3 mins.
>
> The query has been changed and we are now trying to retrieve about 10
> million records. We do expect the time to increase.
>
> With the new query the time taken on windows machine is consistently around
> 40 mins. While the DIH is running queries slow down i.e. a query that
> typically took 60 msec takes 100 msec.
>
> The time taken on linux machine is consistently around 2.5 hours. While the
> DIH is running queries take about 200  to 400 msec.
>
> Thanks!
>
> On Mon, Feb 1, 2016 at 8:45 PM, Erick Erickson <erickerickson@gmail.com>
> wrote:
>
>> What happens if you run just the SQL query from the
>> windows box and from the linux box? Is there any chance
>> that somehow the connection from the linux box is
>> just slower?
>>
>> Best,
>> Erick
>>
>> On Mon, Feb 1, 2016 at 6:36 PM, Alexandre Rafalovitch
>> <arafalov@gmail.com> wrote:
>> > What are you importing from? Is the source and Solr machine collocated
>> > in the same fashion on dev and prod?
>> >
>> > Have you tried running this on a Linux dev machine? Perhaps your prod
>> > machine is loaded much more than a dev.
>> >
>> > Regards,
>> >    Alex.
>> > ----
>> > Newsletter and resources for Solr beginners and intermediates:
>> > http://www.solr-start.com/
>> >
>> >
>> > On 2 February 2016 at 13:21, Troy Edwards <tedwards415107@gmail.com>
>> wrote:
>> >> We have a windows development machine on which the Data Import Handler
>> >> consistently takes about 40 mins to finish. Queries run fine. JVM
>> memory is
>> >> 2 GB per node.
>> >>
>> >> But on a linux machine it consistently takes about 2.5 hours. The
>> queries
>> >> also run slower. JVM memory here is also 2 GB per node.
>> >>
>> >> How should I go about analyzing and tuning the linux machine?
>> >>
>> >> Thanks
>>

Mime
View raw message