lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rafalovitch <arafa...@gmail.com>
Subject Re: Solr document missing or not getting indexed though we get 200 ok status from server
Date Mon, 05 Sep 2016 02:42:51 GMT
I can't tell anything from the document provided. So, here would be my thoughts:

If what you see is some sort of concurrency issues, the documents
missed/dropped would unlikely be exactly the same ones. So, if you see
the same documents dropped, it is much more likely to be something to
do with documents, with handler end-points, with sharding, etc.

If this is easily reproducible, I would run a network analyzer such as
Wireshark and compare your Admin UI session with your client session
and verify that everything expected is absolutely identical.

You could also temporarily turn on Debug via Admin console (under
logs). You could turn individual elements to Trace to get low-level
information on what's happening.

Finally, I am assuming this is all happening with latest Solr? If not,
it may be worth trying that and/or checking Jira for bugs. Lots of
things have been fixed/improved in more recent Solr related to
multi-threaded, multi-server setups.

Regards,
   Alex.

----
Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 5 September 2016 at 00:17, Ganesh M <ganesh.sudhakar.m.o@gmail.com> wrote:
> Hi Alex,
> We tried to post the same manually from SOLR ADMIN / documents UI. It got
> indexed successfully.  We are sure that it's not duplicate issue. We are
> using default update handler and doesn't configure for custom one. We fire
> the request to index using direct HTTP request using <add> <doc> XML
> format. We are getting 200 OK response. But not getting indexed.
>
> This is the request we fired and got 200. But not getting indexed. Same
> request fired via SOLR ADMIN / Document UI, it's getting indexed
> successfully.
> <add>
> <doc>
> <CT_iscof>false</CT_iscof>
> <CT_ui116_s>55788327</CT_ui116_s>
> <CT_iscod>false</CT_iscod>
> <CT_ui114_s>Factuur _PERF29161663_Voor _Va Bene.pdf</CT_ui114_s>
> <CT_ui68_s>55788327-PERF29161663</CT_ui68_s>
> <CT_ui75_f>3.00</CT_ui75_f>
> <CT_ui48_s>2916847</CT_ui48_s>
> <CT_stsid>STCUA0000021500000011472808279078</CT_stsid>
> <CT_ui6_s>EUR</CT_ui6_s>
> <CT_ui74_f>50.00</CT_ui74_f>
> <CT_ui28_s>VAT</CT_ui28_s>
> <CT_ui82_f>50.00</CT_ui82_f>
> <CT_lsti>UA000002150000001:VB1
> VB1:A000002150:vbgroupnft+1:1472808278137</CT_lsti>
> <CT_pdfid>RA000002150AT009428</CT_pdfid>
> <CT__s_RU_I_UA000002150000001>100000,false</CT__s_RU_I_UA000002150000001>
> <CT_ui30_s>62440101</CT_ui30_s>
> <CT_ui152_s>UNKNOWN</CT_ui152_s>
> <CT_content> RA000002150AT009424#Factuur _PERF29161663_Voor _Va Bene.pdf#
> http://srv-cbe-col1.everbinding.com/thumbs/2016/9/2/A000002150/UA000002150000001/1472808278632.png#f
> RA000002150AT009425#pdf.pdf#
> http://srv-cbe-col1.everbinding.com/thumbs/2016/9/2/A000002150/UA000002150000001/1472808278843.png#f
> 1472808279002
> CLEA0000021509223370564294689844EXCC10000019223370564046496793C1LEA0000021509223370564294752110EXCC2000001
> PERF2020916145437 LEA0000021509223370564294752110EXCC2000001 Va Bene VA
> Beheer B.V. LEA0000021509223370564294689844EXCC1000001 VA Beheer B.V. VA
> Beheer B.V.null null null  2.1null  urn:www.cenbii.eu:
> transaction:biicoretrdm010:ver1.0:#urn:www.peppol.eu:
> bis:peppol4a:ver1.0#urn:www.simplerinvoicing.org:si:si-ubl:ver1.1.xnull
>  urn:www.cenbii.eu:profile:bii04:ver2.0null  PERF20209161454372  null
>  1472754600000null  3806 UNCL1001 null  EUR6 ISO 4217 Alpha null null
>  29168472  null null  pdf.pdf2  null null  RA000002150AT009425#pdf.pdf#
> http://srv-cbe-col1.everbinding.com/thumbs/2016/9/2/A000002150/UA000002150000001/1472808278843.png#fpdf.pdf
> application/pdf null null  Factuur _PERF29161663_Voor _Va Bene.pdf2  null
>  PrimaryImagenull null  RA000002150AT009424#Factuur _PERF29161663_Voor _Va
> Bene.pdf#
> http://srv-cbe-col1.everbinding.com/thumbs/2016/9/2/A000002150/UA000002150000001/1472808278632.png#fFactuur
> _PERF29161663_Voor _Va Bene.pdf application/pdf null null null  62440101ZZZ
> NL:KVK null null  2916847ZZZ NL:VAT null null  VA Beheer B.V.null null
>  Schurinkstraatnull  23null  Ommennull  7731GCnull null  NL6
> ISO3166-1:Alpha2 null null  2916847ZZZ NL:VAT null null  VAT6 UN/ECE 5153
> null null  62440101ZZZ NL:KVK null null null  55788327ZZZ NL:KVK null null
>  55788327ZZZ NL:KVK null null  Va Benenull null  Voorstraatnull  26null
>  Voorschotennull  2251BNnull null  NL6 ISO3166-1:Alpha2 null null
>  2916847ZZZ NL:VAT null null  VAT6 UN/ECE 5153 null null  55788327ZZZ
> NL:KVK null null  1475173800000null null null null  NL6 ISO3166-1:Alpha2
> null null  316 UNCL4461 null  1475087400000null  55788327-PERF29161663null
> null  29168472 IBAN null  UNKNOWNBIC null  Betaling?binnen?14?dagen op
> bankrekening?2916847?onder vermelding van?55788327/PERF29161663null null
>  3.00EUR null null  50.00EUR null  3.00EUR null null  S6 UNCL5305 null
>  6.00null null  VAT6 UN/ECE 5153 null null  50.00EUR null  50.00EUR null
>  53.00EUR null  53.00EUR null null  102  null  5.00BX null  50.00EUR null
> null  PERF2020916145437null  PERF2020916145437null null  12  null null  S6
> UNCL5305 null  6.00null null  VAT6 UN/ECE 5153 null null  10.00EUR null
>  RA000002150AT009424#Factuur _PERF29161663_Voor _Va Bene.pdf#
> http://srv-cbe-col1.everbinding.com/thumbs/2016/9/2/A000002150/UA000002150000001/1472808278632.png#f
> DM001 XCNIN199751 NL:KVK:62440101 false false false false 10
> UA000002150000001:VB1 VB1:A000002150:vbgroupnft+1:1472808278137 Ontvangen
> 1472808279002 Factuur GLDT9223370666504283001RA000000006DTP2000001 VB1 VB1
> UA000002150000001 RA000002150AT009428 vbgroupnft+1 A000002150 Group
> 55788327 Va Bene XCNL034435 Va Bene
> LEA0000021509223370564294752110EXCC2000001 vbgroupnft+1 A000002150
> PERF2020916145437 Group 62440101 VA Beheer B.V. XCNL034436 VA Beheer B.V.
> LEA0000021509223370564294689844EXCC1000001
> STCUA0000021500000011472808279078 VB1 VB1 VB1 VB1 UA000002150000001 true
> Factuur GLDT9223370666504283001RA000000006DTP2000001 EM0001
> NL:KVK:55788327</CT_content>
> <CT_ranm>vbgroupnft+1</CT_ranm>
> <CT_lstc>10</CT_lstc>
> <CT_rgnm>Va Bene</CT_rgnm>
> <CT_tdr>true</CT_tdr>
> <CT_ui83_f>50.00</CT_ui83_f>
> <CT_ui64_s>NL</CT_ui64_s>
> <CT__s_RU_O_LEA0000021509223370564294689844EXCC1000001>100000,false</CT__s_RU_O_LEA0000021509223370564294689844EXCC1000001>
> <CT_scxkvk>62440101</CT_scxkvk>
> <CT_sgexid>XCNL034436</CT_sgexid>
> <CT_ui67_l>1475087400000</CT_ui67_l>
> <CT_mtnm>Factuur</CT_mtnm>
> <CT_ui8_s>2916847</CT_ui8_s>
> <CT_sunm>VB1 VB1</CT_sunm>
> <CT_ui66_s>31</CT_ui66_s>
> <CT_ui46_s>NL</CT_ui46_s>
> <CT_ui84_f>53.00</CT_ui84_f>
> <CT_lsts>Ontvangen</CT_lsts>
> <CT_ui42_s>26</CT_ui42_s>
> <rowkey>CLEA0000021509223370564294689844EXCC10000019223370564046496793C1LEA0000021509223370564294752110EXCC2000001</rowkey>
> <CT_rgexid>XCNL034435</CT_rgexid>
> <CT_ui80_s>VAT</CT_ui80_s>
> <CT_sgexnm>VA Beheer B.V.</CT_sgexnm>
> <CT_ui16_s>VA Beheer B.V.</CT_ui16_s>
> <CT_ui44_s>2251BN</CT_ui44_s>
> <CT_ui38_s>Va Bene</CT_ui38_s>
> <CT_iscvd>false</CT_iscvd>
> <CT_munm>VB1 VB1</CT_munm>
> <CT_ui52_s>55788327</CT_ui52_s>
> <CT_ui1_s>2.1</CT_ui1_s>
> <CT_ui104_s>PERF2020916145437</CT_ui104_s>
> <CT_ui56_l>1475173800000</CT_ui56_l>
> <CT_tmsg>EM0001</CT_tmsg>
> <CT_sbj>PERF2020916145437</CT_sbj>
> <CT_ui4_s>PERF2020916145437</CT_ui4_s>
> <CT_ui3_s>urn:www.cenbii.eu:profile:bii04:ver2.0</CT_ui3_s>
> <CT_ui98_s>Betaling?binnen?14?dagen op bankrekening?2916847?onder
> vermelding van?55788327/PERF29161663</CT_ui98_s>
> <CT_ui5_l>1472754600000</CT_ui5_l>
> <CT_ui2_s>urn:www.cenbii.eu:
> transaction:biicoretrdm010:ver1.0:#urn:www.peppol.eu:
> bis:peppol4a:ver1.0#urn:www.simplerinvoicing.org:
> si:si-ubl:ver1.1.x</CT_ui2_s>
> <CT_ui88_f>5.00</CT_ui88_f>
> <CT_muid>UA000002150000001</CT_muid>
> <CT_ui36_s>55788327</CT_ui36_s>
> <CT_sby>Group</CT_sby>
> <CT_toid>NL:KVK:55788327</CT_toid>
> <CT_crid>LEA0000021509223370564294752110EXCC2000001</CT_crid>
> <CT_csid>LEA0000021509223370564294689844EXCC1000001</CT_csid>
> <CT_cid>CLEA0000021509223370564294689844EXCC10000019223370564046496793C1LEA0000021509223370564294752110EXCC2000001</CT_cid>
> <CT_fmid>NL:KVK:62440101</CT_fmid>
> <CT_sgnm>VA Beheer B.V.</CT_sgnm>
> <CT_mdt>1472808279002</CT_mdt>
> <CT_ui113_f>10.00</CT_ui113_f>
> <CT_tnm>Factuur</CT_tnm>
> <CT_said>A000002150</CT_said>
> <CT_ui115_s>62440101</CT_ui115_s>
> <CT_suid>UA000002150000001</CT_suid>
> <CT_raid>A000002150</CT_raid>
> <CT_mtid>GLDT9223370666504283001RA000000006DTP2000001</CT_mtid>
> <CT_dmtd>DM001</CT_dmtd>
> <CT_rcxkvk>55788327</CT_rcxkvk>
> <CT_ui111_s>VAT</CT_ui111_s>
> <CT_ui106_s>1</CT_ui106_s>
> <CT_ui50_s>VAT</CT_ui50_s>
> <CT_ui14_s>2916847</CT_ui14_s>
> <CT_exid>XCNIN199751</CT_exid>
> <CT_sdur>VB1 VB1</CT_sdur>
> <CT_ui153_s>PERF2020916145437</CT_ui153_s>
> <CT_ui100_t1>RA000002150AT009424#Factuur _PERF29161663_Voor _Va Bene.pdf#
> http://srv-cbe-col1.everbinding.com/thumbs/2016/9/2/A000002150/UA000002150000001/1472808278632.png#f
> </CT_ui100_t1>
> <CT_ui21_s>Ommen</CT_ui21_s>
> <CT_ui109_s>6.00</CT_ui109_s>
> <CT_csnm>VA Beheer B.V.</CT_csnm>
> <CT_ui85_f>53.00</CT_ui85_f>
> <CT_rby>Group</CT_rby>
> <CT_tid>GLDT9223370666504283001RA000000006DTP2000001</CT_tid>
> <CT_ui108_s>S</CT_ui108_s>
> <CT_crnm>Va Bene</CT_crnm>
> <CT_ui26_s>2916847</CT_ui26_s>
> <CT_ui20_s>23</CT_ui20_s>
> <CT__s_RU_I_LEA0000021509223370564294752110EXCC2000001>100000,false</CT__s_RU_I_LEA0000021509223370564294752110EXCC2000001>
> <CT_ui101_s>PrimaryImage</CT_ui101_s>
> <CT_ui24_s>NL</CT_ui24_s>
> <CT_ui22_s>7731GC</CT_ui22_s>
> <CT_uctx_UA000002150000001_s1>CLEA0000021509223370564294689844EXCC10000019223370564046496793C1LEA0000021509223370564294752110EXCC2000001
> false</CT_uctx_UA000002150000001_s1>
> <CT_ui43_s>Voorschoten</CT_ui43_s>
> <CT__s_dxat_2>RA000002150AT009425#pdf.pdf#
> http://srv-cbe-col1.everbinding.com/thumbs/2016/9/2/A000002150/UA000002150000001/1472808278843.png#f
> </CT__s_dxat_2>
> <CT__s_dxat_1>RA000002150AT009424#Factuur _PERF29161663_Voor _Va Bene.pdf#
> http://srv-cbe-col1.everbinding.com/thumbs/2016/9/2/A000002150/UA000002150000001/1472808278632.png#f
> </CT__s_dxat_1>
> <CT_uctx_UA000002150000001_LEA0000021509223370564294689844EXCC1000001_s1>CLEA0000021509223370564294689844EXCC10000019223370564046496793C1LEA0000021509223370564294752110EXCC2000001
> false</CT_uctx_UA000002150000001_LEA0000021509223370564294689844EXCC1000001_s1>
> <CT_ui70_s>2916847</CT_ui70_s>
> <CT_cdt>1472808279002</CT_cdt>
> <CT_ui19_s>Schurinkstraat</CT_ui19_s>
> <CT_sgid>LEA0000021509223370564294689844EXCC1000001</CT_sgid>
> <CT_rgexnm>Va Bene</CT_rgexnm>
> <CT_ui72_f>3.00</CT_ui72_f>
> <CT_ui87_s>10</CT_ui87_s>
> <CT__s_RU_O_UA000002150000001>100000,false</CT__s_RU_O_UA000002150000001>
> <CT_ui77_s>S</CT_ui77_s>
> <CT_cnm>PERF2020916145437</CT_cnm>
> <CT_sanm>vbgroupnft+1</CT_sanm>
> <CT_ird>false</CT_ird>
> <CT_ui146_s>380</CT_ui146_s>
> <CT_ui89_f>50.00</CT_ui89_f>
> <CT_ui41_s>Voorstraat</CT_ui41_s>
> <CT_daf>RA000002150AT009424#Factuur _PERF29161663_Voor _Va Bene.pdf#
> http://srv-cbe-col1.everbinding.com/thumbs/2016/9/2/A000002150/UA000002150000001/1472808278632.png#f
> </CT_daf>
> <CT_ui78_s>6.00</CT_ui78_s>
> <CT_rgid>LEA0000021509223370564294752110EXCC2000001</CT_rgid>
> </doc>
> </add>
>
>
> Only difference is when we post via manually via SOLR ADMIN, it's fired
> when there is no concurrency. But initially there would be around 50
> threads firing update POST request and also few threads fire's GET request
> to different collections.
> Little more information about the setup....
> We have around 5 Collection and each collection has 2 shards ( one shard in
> each node, one shard for index and other for replica), totally 2 nodes with
> master master setup.
>
> We are getting this error only when there is concurrency of of around 50
> threads firing POST request to various collections same time.
>
> Strange thing is why SOLR not returning error when it's not able to index
> it. If SOLR has returned error, we could have retry the document indexing.
> Is there any way we can make SOLR to return error instead of 200 when they
> fail to index ?
>
> Regards,
> Ganesh
>
> On Sun, Sep 4, 2016 at 10:11 PM Alexandre Rafalovitch <arafalov@gmail.com>
> wrote:
>
>> Can you identify the specific documents that 'fail'? What happens if
>> you post them manually? Try posting them manually but with one field
>> super-distinct to see whether it made it in. What happens if you post
>> it to an empty index (copy definition and try).
>>
>> Also, what's your request handler's parameters look like. Perhaps you
>> have a signature processor, in which case it may be triggering
>> duplicates avoidance with different calculation from just an id.
>>
>> My guess is still that it is some sort of duplicate issue.
>>
>> Regards,
>>    Alex.
>> ----
>> Newsletter and resources for Solr beginners and intermediates:
>> http://www.solr-start.com/
>>
>>
>> On 4 September 2016 at 23:10, Ganesh M <mganeshs@live.in> wrote:
>> > Some more information on this... Most of documents get indexed properly.
>> Few documents are not getting indexed.
>> >
>> > All documents POST are seen in the localhost_access and 200 OK response
>> is seen in local host access file. But in catalina, there are some
>> difference in the logs for which are indexing properly, following is the
>> logs.
>> >
>> > FINE: PRE_UPDATE add
>> >
>> {,id=CUA0000004390000019223370564139207241C3LEA0000020769223370567404392838EXCC3000001}
>> >
>> params(crid=CUA0000004390000019223370564139207241C3LEA0000020769223370567404392838EXCC3000001),defaults(wt=xml)
>> > Sep 01, 2016 7:39:31 AM org.apache.solr.update.TransactionLog <init>
>> > FINE: New TransactionLog
>> file=/ebdata2/solrdata/IOB_shard1_replica1/data/tlog/tlog.0000000000000220856,
>> exists=false, size=0, openExisting=false
>> > Sep 01, 2016 7:39:31 AM org.apache.solr.update.SolrCmdDistributor submit
>> > FINE: sending update to
>> http://xx.xx.xx.xx:7070/solr/IOB_shard1_replica2/ retry:0
>> add{version=1544254202941800448,id=CUA0000004390000019223370564139207241C3LEA0000020769223370567404392838EXCC3000001}
>> params:update.distrib=FROMLEADER&distrib.from=http%3A%2F%2Fxx.xx.xx.xx%3A7070%2Fsolr%2FIOB_shard1_replica1%2F
>> > Sep 01, 2016 7:39:31 AM
>> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner run
>> > FINE: starting runner:
>> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner@3fb794b2
>> > Sep 01, 2016 7:39:31 AM
>> org.apache.solr.update.processor.LogUpdateProcessor finish
>> > FINE: PRE_UPDATE FINISH
>> params(crid=CUA0000004390000019223370564139207241C3LEA0000020769223370567404392838EXCC3000001),defaults(wt=xml)
>> > Sep 01, 2016 7:39:31 AM
>> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner run
>> > FINE: finished:
>> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner@3fb794b2
>> > Sep 01, 2016 7:39:31 AM
>> org.apache.solr.update.processor.LogUpdateProcessor finish
>> > INFO: [IOB_shard1_replica1] webapp=/solr path=/update params=
>> >
>> {crid=CUA0000004390000019223370564139207241C3LEA0000020769223370567404392838EXCC3000001}
>> >
>> {add=[CUA0000004390000019223370564139207241C3LEA0000020769223370567404392838EXCC3000001
>> (1544254202941800448)]}
>> > Sep 01, 2016 7:39:31 AM org.apache.solr.servlet.SolrDispatchFilter
>> doFilter
>> > FINE: Closing out SolrRequest:
>> params(crid=CUA0000004390000019223370564139207241C3LEA0000020769223370567404392838EXCC3000001),defaults(wt=xml)
>> > -------------------------------------------------
>> >
>> > For the one which document is not getting indexed, we could see only
>> following log in catalina.out. Not sure whether it's getting added to SOLR.
>> >
>> >
>> > Sep 01, 2016 7:39:56 AM
>> org.apache.solr.update.processor.LogUpdateProcessor finish
>> > FINE: PRE_UPDATE FINISH
>> params(crid=CUA0000004390000019223370564139182810C3LEA0000020179223370567061972057EXCC1000002),defaults(wt=xml)
>> > Sep 01, 2016 7:39:56 AM
>> org.apache.solr.update.processor.LogUpdateProcessor finish
>> > INFO: [IOB_shard1_replica1] webapp=/solr path=/update params=
>> >
>> {crid=CUA0000004390000019223370564139182810C3LEA0000020179223370567061972057EXCC1000002}
>> > {} 0 1
>> > Sep 01, 2016 7:39:56 AM org.apache.solr.servlet.SolrDispatchFilter
>> doFilter
>> > FINE: Closing out SolrRequest:
>> params(crid=CUA0000004390000019223370564139182810C3LEA0000020179223370567061972057EXCC1000002),defaults(wt=xml)
>> >
>> > ----------------------
>> >
>> > You can see that in above log for missing documents ( which is not
>> indexed), in catalina log, we are not seeing "PRE UPDATE ADD". Is that
>> causing / reason for document not getting indexed ?
>> >
>> > We have set autosoftcommit to 1 seconds and autohardcommit to 30 seconds.
>> >
>> > We are not getting any errors or exceptions in the log.
>> >
>> > This issue is becoming very critical and sort of reliable factor. Though
>> we get 200 OK response from SOLR for update HTTP POST request, nothing
>> happens on the SOLR side. If SOLR is not able to process, isn't it we get
>> error from SOLR instead of giving 200 OK response.
>> >
>> > Anybody has faced this sort of issue or any sort of help would be very
>> much appreciated.
>> >
>> >
>> >
>> >
>> > On Sun, Sep 4, 2016 at 12:59 PM Ganesh M <mganeshs@live.in<mailto:
>> mganeshs@live.in>> wrote:
>> > Nitin, Thanks for reply. Our each document has unique id and its hbase
>> rowkey id. So it will be unique only. So there is no chance of duplicates
>> id being send.
>> >
>> >
>> >
>> > On Sun 4 Sep, 2016 12:41 pm Nitin Kumar, <nitinkumar.iitm@gmail.com
>> <mailto:nitinkumar.iitm@gmail.com>> wrote:
>> > Please check doc's unique key(Id). All keys shd be unique. Else docs
>> having
>> > same id will be replaced.
>> >
>> > On 04-Sep-2016 12:13 PM, "Ganesh M" <mganeshs@live.in<mailto:
>> mganeshs@live.in>> wrote:
>> >
>> >> Hi,
>> >> we are keep sending documents to Solr from our app server. Single
>> document
>> >> per request, but in parallel of 10 request hits solr cloud in a second.
>> >>
>> >> We could see our post request ( update request ) hitting our solr 5.4 in
>> >> localhost_access logs, and it's response as 200 Ok response. And also we
>> >> get HTTP 200 OK response to our app servers as well for out HTTP
>> request we
>> >> fired to SOLR Cloud.
>> >>
>> >> But few documents are not getting indexed. Out of 2000 documents we sent
>> >> 10 documents are getting missed. Thought there is not error, few
>> documents
>> >> are getting missed.
>> >>
>> >> We use autoSoftcommit as 2 secs and autohardcommit as 30 secs.
>> >>
>> >> Why is that 10 documents not getting indexed and also no error getting
>> >> thrown back if server is not able to index it ?
>> >>
>> >> Regards,
>> >>
>> >>
>> >>
>> >>
>>

Mime
View raw message