lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mikhail Khludnev (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-3360) Problem with DataImportHandler multi-threaded
Date Mon, 07 May 2012 19:28:49 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269925#comment-13269925
] 

Mikhail Khludnev commented on SOLR-3360:
----------------------------------------

James,

I can't reproduce the failure. 
mkhl$ ant test-contrib -Dtests.seed=-55eeb72d0a16dfec:4e1a59f5738a6b25:4bf3cbf2bd3b659a -Dtestcase=TestThreaded

junit report 

{code}
<property name="tests.seed" value="-55eeb72d0a16dfec:4e1a59f5738a6b25:4bf3cbf2bd3b659a"
/>

  <testcase classname="org.apache.solr.handler.dataimport.TestThreaded" name="testCachedThread_FullImport"
time="0.965" />
  <testcase classname="org.apache.solr.handler.dataimport.TestThreaded" name="testCachedThreadless_FullImport"
time="0.055" />
  <testcase classname="org.apache.solr.handler.dataimport.TestThreaded" name="testCachedSingleThread_FullImport"
time="0.053" />

{code}


Finally I added _Total test which enumerates all test params 
https://github.com/m-khl/solr-patches/commit/0532e653a3319247519f90bd8987c84171ac6a56.diff

at core i5 MacBook Pro

:solr mkhl$ ant test-contrib -Dtestcase=TestThreaded
junit-sequential:
    [junit] Testsuite: org.apache.solr.handler.dataimport.TestThreaded
    [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 3.899 sec
    [junit] 

junit-parallel:

{code}
 </properties>
  <testcase classname="org.apache.solr.handler.dataimport.TestThreaded" name="testCachedThread_FullImport"
time="1.042" />
  <testcase classname="org.apache.solr.handler.dataimport.TestThreaded" name="testCachedThreadless_FullImport"
time="0.047" />
  <testcase classname="org.apache.solr.handler.dataimport.TestThreaded" name="testCachedSingleThread_FullImport"
time="0.052" />
  <testcase classname="org.apache.solr.handler.dataimport.TestThreaded" name="testCachedThread_Total"
time="0.898" />
  <system-out><![CDATA[]]></system-out>
{code}


Please give me a clue how to reproduce the failure. What do you use IDE or script? Have you
clean before test? Pls show me exact command, junit report, log/output, etc
                
> Problem with DataImportHandler multi-threaded
> ---------------------------------------------
>
>                 Key: SOLR-3360
>                 URL: https://issues.apache.org/jira/browse/SOLR-3360
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 3.6
>         Environment: Solr 3.6.0, Apache Tomcat 6.0.20, jdk1.6.0_15, Windows XP
>            Reporter: Claudio R
>            Assignee: James Dyer
>             Fix For: 3.6.1
>
>         Attachments: SOLR-3360-test.patch, SOLR-3360-test.patch, SOLR-3360-test.patch,
SOLR-3360-test.patch, SOLR-3360-test.patch, SOLR-3360-test.patch, SOLR-3360.patch
>
>
> Hi,
> If I use dataimport with 1 thread, I got:
> <lst name="statusMessages">
>    <str name="Total Requests made to DataSource">5001</str>
>    <str name="Total Rows Fetched">1000</str>
>    <str name="Total Documents Skipped">0</str>
>    <str name="Full Dump Started">2012-04-16 11:21:57</str>
>    <str name="">Indexing completed. Added/Updated: 1000 documents. Deleted 0 documents.</str>
>    <str name="Committed">2012-04-16 11:23:19</str>
>    <str name="Total Documents Processed">1000</str>
>    <str name="Time taken">0:1:22.390</str>
> </lst>
> If I use datamport with 10 threads, I got:
> <lst name="statusMessages">
>    <str name="Total Requests made to DataSource">0</str>
>    <str name="Total Rows Fetched">10000</str>
>    <str name="Total Documents Skipped">0</str>
>    <str name="Full Dump Started">2012-04-16 11:31:43</str>
>    <str name="">Indexing completed. Added/Updated: 10000 documents. Deleted 0 documents.</str>
>    <str name="Committed">2012-04-16 11:41:50</str>
>    <str name="Total Documents Processed">10000</str>
>    <str name="Time taken">0:10:7.586</str>
> </lst>
> The configuration of 10 threads consumed 10 times longer than the configuration with
1 thread.
> I have 1000 records in the database.
> My db-data-config.xml is shown below:
> <?xml version="1.0" encoding="UTF-8" ?>
> <dataConfig>
>    <dataSource driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" url="jdbc:sqlserver://200.XXX.XXX.XXX:1433;databaseName=test"
user="user" password="pass"/>
>       <document>
>          <entity name="indice" rootEntity="true" threads="10" transformer="RegexTransformer,TemplateTransformer"
query="select top 1000 i.id_indice, i.a, i.b from indice i where i.status = 'I'" deltaImportQuery="i.id_indice,
i.a, i.b from indice i where id_indice in ('${dataimporter.delta.id_indice}')" deltaQuery="select
id_indice from indice where status='I' and data_hora_modificacao >= convert(datetime, '${dataimporter.last_index_time}',
120)" deletedPkQuery="select id_indice from indice where status='D' and data_hora_modificacao
>= convert(datetime, '${dataimporter.last_index_time}', 120)">	
>             <field column="id_indice" name="id_indice" />
>             <field column="a" name="a" />
>             <field column="b" name="b" />
>             <entity name="filtro" transformer="RegexTransformer,TemplateTransformer"
query="select categoria, sub_categoria from filtro where indice_id_indice = '${indice.id_indice}'">
>                <field name="filtro_categoria" column="categoria" />
>                <field name="filtro_sub_categoria" column="sub_categoria" />
>                <field name="nv_sub_categoria" column="nv_sub_categoria" template="${filtro.categoria}|${filtro.sub_categoria}"
/>
>             </entity>
>             <entity name="pagina_relacionada" query="select url from pagina_relacionada
where indice_id_indice = '${indice.id_indice}'">
>                <field name="pagina_relacionada_url" column="url" />
>             </entity>
>             <entity name="veja_mais" query="select chamada, url from veja_mais where
indice_id_indice = '${indice.id_indice}'">
>                <field name="veja_mais_chamada" column="chamada" />
>                <field name="veja_mais_url" column="url" />
>             </entity>
>             <entity name="video" query="select url from video where indice_id_indice
= '${indice.id_indice}'">
>                <field name="video_url" column="url" />
>             </entity>
>             <entity name="galeria" query="select url from galeria where indice_id_indice
= '${indice.id_indice}'">
>                <field name="galeria_url" column="url" />
>             </entity>
>          </entity>
>       </document>
> </dataConfig>
> Thanks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message