lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ted zhu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-9478) Do delta import will loss some documents, when the documents added in the duration of delta import.
Date Mon, 05 Sep 2016 09:25:21 GMT

     [ https://issues.apache.org/jira/browse/SOLR-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

ted zhu updated SOLR-9478:
--------------------------
    Description: 
hello guys,
    <p>I met a problem when i using the solrcloud mode. When the solr instance run delta-import,
it may take 
some time to be finished( my data source is mysql database). So during this time, the new
added documents
will loss, the deltaQuery, i use SUBDATE($\{dih.last_index_time\}, INTERVAL 2 MINUTE), 
let it run the delta-import 2 mins earlier than the last_index_time, if the delta-import's
duration is 5 mins, it will loss the records at the first 3 mins.</p> 
    Our servers doesn't use solr cloud mode before, we deal with this issue is tring to rewrite
dataimport.properties file, 
query the max(sys_time_stamp), which will help to record the max time stamp, and let the solr
can run delta import standing 
by the time found in the file, of course, it will never miss docuements. 
   But now, we use solrcloud, the dataimport.properties is on the zookeeper, and we may have
multiple collections for the 
same core.how can i update the dataimport.properties file now in colleciton now? Do you have
any solution to help record
 the max(sys_time_stamp) in dataimport.properties, rather than using the time of delta-import
start to run?

Cheers

  was:
hello guys,
    I met a problem when i using the solrcloud mode. When the solr instance run delta-import,
it may take 
some time to be finished( my data source is mysql database). So during this time, the new
added documents
will loss, the deltaQuery, i use SUBDATE($\{dih.last_index_time\}, INTERVAL 2 MINUTE), 
let it run the delta-import 2 mins earlier than the last_index_time, if the delta-import's
duration is 5 mins, it will loss the records at the first 3 mins. 
    Our servers doesn't use solr cloud mode before, we deal with this issue is tring to rewrite
dataimport.properties file, 
query the max(sys_time_stamp), which will help to record the max time stamp, and let the solr
can run delta import standing 
by the time found in the file, of course, it will never miss docuements. 
   But now, we use solrcloud, the dataimport.properties is on the zookeeper, and we may have
multiple collections for the 
same core.how can i update the dataimport.properties file now in colleciton now? Do you have
any solution to help record
 the max(sys_time_stamp) in dataimport.properties, rather than using the time of delta-import
start to run?

Cheers


> Do delta import will loss some documents, when the documents added in the duration of
delta import.
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-9478
>                 URL: https://issues.apache.org/jira/browse/SOLR-9478
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: 5.5, 5.5.2, 6.0
>            Reporter: ted zhu
>              Labels: delta-import, solrcloud
>             Fix For: 5.5.2
>
>
> hello guys,
>     <p>I met a problem when i using the solrcloud mode. When the solr instance
run delta-import, it may take 
> some time to be finished( my data source is mysql database). So during this time, the
new added documents
> will loss, the deltaQuery, i use SUBDATE($\{dih.last_index_time\}, INTERVAL 2 MINUTE),

> let it run the delta-import 2 mins earlier than the last_index_time, if the delta-import's
duration is 5 mins, it will loss the records at the first 3 mins.</p> 
>     Our servers doesn't use solr cloud mode before, we deal with this issue is tring
to rewrite dataimport.properties file, 
> query the max(sys_time_stamp), which will help to record the max time stamp, and let
the solr can run delta import standing 
> by the time found in the file, of course, it will never miss docuements. 
>    But now, we use solrcloud, the dataimport.properties is on the zookeeper, and we may
have multiple collections for the 
> same core.how can i update the dataimport.properties file now in colleciton now? Do you
have any solution to help record
>  the max(sys_time_stamp) in dataimport.properties, rather than using the time of delta-import
start to run?
> Cheers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message