lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yangrui Guo <guoyang...@gmail.com>
Subject data import extremely slow
Date Fri, 06 Nov 2015 17:32:41 GMT
Hi

I'm using Solr's data import handler and MySQL 5.5 to index imdb database.
However the data-import takes a few minutes to process one document while
there are over 3 million movies. This is going to take forever yet I can
select the rows in MySQL in no time. Where am I doing wrong? My
data-config.xml is like below:

<entity name="movie" transformer="RegexTransformer" query="SELECT DISTINCT
* FROM imdb.movie">
<field name="id" column="id" />
<entity name="movie_actor" transformer="RegexTransformer"
child="true" query="SELECT DISTINCT * FROM imdb.movie_actor"
cacheKey="movie_actor.parent" cacheLookup="movie.id"
processor="SqlEntityProcessor" cacheImpl="SortedMapBackedCache">
<field name="name" column="name" />
</entity>
<entity name="movie_actress" transformer="RegexTransformer"
child="true" query="SELECT DISTINCT * FROM imdb.movie_actress"
cacheKey="movie_actress.parent" cacheLookup="movie.id"
processor="SqlEntityProcessor" cacheImpl="SortedMapBackedCache">
<field name="name" column="name" />
</entity>
</entity>

I created views for the database:

movie:

    SELECT
        `title`.`id` AS `id`
    FROM
        `title`

movie_actor:

    SELECT
        CONCAT('movie.',
                `title`.`id`,
                '.actor.',
                `cast_info`.`person_id`) AS `id`,
        `title`.`id` AS `parent`,
        `name`.`name` AS `name`,
    FROM
        ((`title`
        JOIN `cast_info` ON ((`cast_info`.`movie_id` = `title`.`id`)))
        JOIN `name` ON ((`cast_info`.`person_id` = `name`.`id`)))
    WHERE
        (`cast_info`.`role_id` = 1)

movie_actress:

    SELECT
        CONCAT('movie.',
                `title`.`id`,
                '.actress.',
                `cast_info`.`person_id`) AS `id`,
        `title`.`id` AS `parent`,
        `name`.`name` AS `name`,
    FROM
        ((`title`
        JOIN `cast_info` ON ((`cast_info`.`movie_id` = `title`.`id`)))
        JOIN `name` ON ((`cast_info`.`person_id` = `name`.`id`)))
    WHERE
        (`cast_info`.`role_id` = 2)

Thanks,

Yangrui

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message