lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rafalovitch <arafa...@gmail.com>
Subject Re: DIH Full Index Issue
Date Thu, 09 Mar 2017 15:24:28 GMT
I am not super familiar with what happens with DIH in the SolrCloud
and I was talking about HTTP access.log (which admittedly needs custom
configuration). However, I do see this in the logs:

2017-03-09 13:40:00.356 INFO  (qtp2080166188-41928) [c:collectionXXX
s:shard1 r:core_node1 x:collectionXXX_shard1_replica2]
o.a.s.c.S.Request [collectionXXX_shard1_replica2]  webapp=/solr
path=/dataimport
params={optimize=true&commit=true&clean=true&command=full-import}
status=0 QTime=3
....
2017-03-09 13:40:26.115 INFO  (qtp2080166188-39334) [c:collectionXXX
s:shard1 r:core_node1 x:collectionXXX_shard1_replica2]
o.a.s.u.p.LogUpdateProcessorFactory [collectionXXX_shard1_replica2]
webapp=/solr path=/update
params={update.distrib=TOLEADER&distrib.from=http://XXX.XXX.XXX.189:8080/solr/collectionXXX_shard1_replica1/&wt=javabin&version=2}{}
0 0
....
2017-03-09 13:40:26.131 INFO  (qtp2080166188-34463) [c:collectionXXX
s:shard1 r:core_node1 x:collectionXXX_shard1_replica2]
o.a.s.u.DirectUpdateHandler2 start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

It seems like there is a DIH start at 13:40:00, but then at 13:40:26
there is an update hit to another shard which propogates to this one
and causes the commit making DIH changes at that point visible.
Interestingly, there seem to be an error in that distributed update as
I see "missing required field: adtype", which may mean the actual
incoming records from other shards did not get into the index.

So, to me, my theory stands so far and I would look for something
sending - invalid - documents to shard1_replica1 at the same time as
you are running DIH on shard1_replica2.

Regards,
  Alex.
----
http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 9 March 2017 at 10:13, AJ Lemke <aj.lemke@securitylabs.com> wrote:
> Thanks for the tip Alex.
>
> I forgot to mention that we are using Solr in cloud mode.
>
> I have checked the logs and I am seeing the following lines (this is an abridged snippet):
>
> 2017-03-09 13:40:00.356 INFO  (qtp2080166188-41928) [c:collectionXXX s:shard1 r:core_node1
x:collectionXXX_shard1_replica2] o.a.s.h.d.DataImporter Loading DIH Configuration: data-config.xml
> 2017-03-09 13:40:00.356 INFO  (qtp2080166188-41928) [c:collectionXXX s:shard1 r:core_node1
x:collectionXXX_shard1_replica2] o.a.s.h.d.DataImporter Data Configuration loaded successfully
> 2017-03-09 13:40:00.356 INFO  (Thread-17768) [c:collectionXXX s:shard1 r:core_node1 x:collectionXXX_shard1_replica2]
o.a.s.h.d.DataImporter Starting Full Import
> 2017-03-09 13:40:00.356 INFO  (qtp2080166188-41928) [c:collectionXXX s:shard1 r:core_node1
x:collectionXXX_shard1_replica2] o.a.s.c.S.Request [collectionXXX_shard1_replica2]  webapp=/solr
path=/dataimport params={optimize=true&commit=true&clean=true&command=full-import}
status=0 QTime=3
> 2017-03-09 13:40:00.450 INFO  (Thread-17768) [c:collectionXXX s:shard1 r:core_node1 x:collectionXXX_shard1_replica2]
o.a.s.s.SolrIndexSearcher Opening [Searcher@7c2e9318[collectionXXX_shard1_replica2] realtime]
> 2017-03-09 13:40:00.560 INFO  (Thread-17768) [c:collectionXXX s:shard1 r:core_node1 x:collectionXXX_shard1_replica2]
o.a.s.h.d.JdbcDataSource Creating a connection for entity item with URL: jdbc:sqlserver://sqlX:XXX;databaseName=XXXXXX;responseBuffering=adaptive;
> 2017-03-09 13:40:00.576 INFO  (Thread-17768) [c:collectionXXX s:shard1 r:core_node1 x:collectionXXX_shard1_replica2]
o.a.s.h.d.JdbcDataSource Time taken for getConnection(): 13
> 2017-03-09 13:40:26.115 INFO  (qtp2080166188-39334) [c:collectionXXX s:shard1 r:core_node1
x:collectionXXX_shard1_replica2] o.a.s.u.p.LogUpdateProcessorFactory [collectionXXX_shard1_replica2]
 webapp=/solr path=/update params={update.distrib=TOLEADER&distrib.from=http://XXX.XXX.XXX.189:8080/solr/collectionXXX_shard1_replica1/&wt=javabin&version=2}{}
0 0
> 2017-03-09 13:40:26.115 ERROR (qtp2080166188-39334) [c:collectionXXX s:shard1 r:core_node1
x:collectionXXX_shard1_replica2] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException:
[doc=11326865] missing required field: adtype
> 2017-03-09 13:40:26.131 INFO  (qtp2080166188-34463) [c:collectionXXX s:shard1 r:core_node1
x:collectionXXX_shard1_replica2] o.a.s.u.DirectUpdateHandler2 start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
> 2017-03-09 13:40:26.131 INFO  (qtp2080166188-34463) [c:collectionXXX s:shard1 r:core_node1
x:collectionXXX_shard1_replica2] o.a.s.u.SolrIndexWriter Calling setCommitData with IW:org.apache.solr.update.SolrIndexWriter@61db76c2
> 2017-03-09 13:40:28.350 INFO  (qtp2080166188-34463) [c:collectionXXX s:shard1 r:core_node1
x:collectionXXX_shard1_replica2] o.a.s.s.SolrIndexSearcher Opening [Searcher@a4ed679[collectionXXX_shard1_replica2]
main]
> 2017-03-09 13:40:28.350 INFO  (searcherExecutor-50-thread-1-processing-n:XXX.XXX.XXX.188:8080_solr
x:collectionXXX_shard1_replica2 s:shard1 c:collectionXXX r:core_node1) [c:collectionXXX s:shard1
r:core_node1 x:collectionXXX_shard1_replica2] o.a.s.c.QuerySenderListener QuerySenderListener
sending requests to Searcher@a4ed679[collectionXXX_shard1_replica2] main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_oa1(6.3.0):C31847)
Uninverting(_oa2(6.3.0):C2520)))}
> 2017-03-09 13:40:28.350 INFO  (searcherExecutor-50-thread-1-processing-n:XXX.XXX.XXX.188:8080_solr
x:collectionXXX_shard1_replica2 s:shard1 c:collectionXXX r:core_node1) [c:collectionXXX s:shard1
r:core_node1 x:collectionXXX_shard1_replica2] o.a.s.c.QuerySenderListener QuerySenderListener
done.
> 2017-03-09 13:40:28.350 INFO  (searcherExecutor-50-thread-1-processing-n:XXX.XXX.XXX.188:8080_solr
x:collectionXXX_shard1_replica2 s:shard1 c:collectionXXX r:core_node1) [c:collectionXXX s:shard1
r:core_node1 x:collectionXXX_shard1_replica2] o.a.s.c.SolrCore [collectionXXX_shard1_replica2]
Registered new searcher Searcher@a4ed679[collectionXXX_shard1_replica2] main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_oa1(6.3.0):C31847)
Uninverting(_oa2(6.3.0):C2520)))}
> 2017-03-09 13:40:28.365 INFO  (qtp2080166188-34463) [c:collectionXXX s:shard1 r:core_node1
x:collectionXXX_shard1_replica2] o.a.s.u.DirectUpdateHandler2 end_commit_flush
> 2017-03-09 13:40:28.444 INFO  (qtp2080166188-34463) [c:collectionXXX s:shard1 r:core_node1
x:collectionXXX_shard1_replica2] o.a.s.u.p.LogUpdateProcessorFactory [collectionXXX_shard1_replica2]
 webapp=/solr path=/update params={update.distrib=FROMLEADER&waitSearcher=true&openSearcher=true&commit=true&softCommit=false&distrib.from=http://XXX.XXX.XXX.189:8080/solr/collectionXXX_shard1_replica1/&commit_end_point=true&wt=javabin&version=2&expungeDeletes=false}{commit=}
0 2309
> 2017-03-09 13:40:56.459 INFO  (qtp2080166188-40636) [c:collectionXXX s:shard1 r:core_node1
x:collectionXXX_shard1_replica2] o.a.s.u.p.LogUpdateProcessorFactory [collectionXXX_shard1_replica2]
 webapp=/solr path=/update/json params={commit=true&wt=json}{} 0 0
> 2017-03-09 13:40:56.459 ERROR (qtp2080166188-40636) [c:collectionXXX s:shard1 r:core_node1
x:collectionXXX_shard1_replica2] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException:
[doc=553094612] missing required field: adtype
> 2017-03-09 13:41:00.053 INFO  (qtp2080166188-41928) [c:collectionXXX s:shard1 r:core_node1
x:collectionXXX_shard1_replica2] o.a.s.c.S.Request [collectionXXX_shard1_replica2]  webapp=/solr
path=/schema params={wt=json} status=0 QTime=0
>
>
>
> AJ
>
> -----Original Message-----
> From: Alexandre Rafalovitch [mailto:arafalov@gmail.com]
> Sent: Wednesday, March 8, 2017 9:33 AM
> To: solr-user <solr-user@lucene.apache.org>
> Subject: Re: DIH Full Index Issue
>
> Are you perhaps indexing at the same time from the source other than DIH? Because the
commit is global and all the changes from all the sources will become visible.
>
> Check the access logs perhaps to see the requests to /update handler or similar.
>
> Regards,
>    Alex.
>
>
> ----
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
>
> On 8 March 2017 at 09:27, AJ Lemke <aj.lemke@securitylabs.com> wrote:
>> Good Morning List!
>>
>> I have an issue where my DIH full index is committed after a minute of indexing.
>> My counts will fall from around 400K to 85K until the import is finished, usually
about four (4) minutes later.
>>
>> This is problematic for us as there are 315K missing items in our searches.
>>
>> Versioning Info:
>> solr-spec - 6.3.0
>> solr-impl - 6.3.0 a66a44513ee8191e25b477372094bfa846450316 - shalin -
>> 2016-11-02 19:52:42 lucene-spec - 6.3.0 lucene-impl - 6.3.0
>> a66a44513ee8191e25b477372094bfa846450316 - shalin - 2016-11-02
>> 19:47:11
>>
>> solrconfig.xml snippet
>> <!-- The default high-performance update handler --> <updateHandler
>> class="solr.DirectUpdateHandler2">
>>     <updateLog>
>>         <str name="dir">${solr.ulog.dir:}</str>
>>     </updateLog>
>>     <autoCommit>
>>         <maxTime>-1</maxTime>
>>         <openSearcher>false</openSearcher>
>>     </autoCommit>
>>     <autoSoftCommit>
>>         <maxTime>-1</maxTime>
>>     </autoSoftCommit>
>> </updateHandler>
>>
>>
>> Any insights would be greatly appreciated.
>> Let me know if more information is required.
>>
>> AJ

Mime
View raw message