lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Pendlebury <greg.pendleb...@gmail.com>
Subject Re: Batch update, order of evaluation
Date Thu, 09 Sep 2010 10:11:20 GMT
I can't reproduce reliably, so I'm suspecting there are issues in our code.
I'm refactoring to avoid the problem entirely.

Thanks for the response though Erick.

Greg

On 8 September 2010 21:51, Greg Pendlebury <greg.pendlebury@gmail.com>wrote:

> Thanks,
>
> I'll create a deliberate test tomorrow feed some random data through it
> several times to see what happens.
>
> I'm also working on simply improving the buffer to handle the situation
> internally, but a few hours of testing isn't a big deal.
>
> Ta,
> Greg
>
>
> On 8 September 2010 21:41, Erick Erickson <erickerickson@gmail.com> wrote:
>
>> This would be surprising behavior, if you can reliably reproduce this
>> it's worth a JIRA.
>>
>> But (and I'm stretching a bit here) are you sure you're committing at the
>> end of the batch AND are you sure you're looking after the commit? Here's
>> the scenario: Your updated document is a position 1 and 100 in your batch.
>> Somewhere around SOLR processing document 50, an autocommit occurs,
>> and you're looking at your results before SOLR gets around to committing
>> document 100. Like I said, it's a stretch.
>>
>> To test this, you need to be absolutely sure of two things before you
>> search:
>> 1> the batch is finished processing
>> 2> you've issued a commit after the last document in the batch.
>>
>> If you're sure of the above and still see the problem, please let us
>> know...
>>
>> HTH
>> Erick
>>
>> On Tue, Sep 7, 2010 at 10:32 PM, Greg Pendlebury
>> <greg.pendlebury@gmail.com>wrote:
>>
>> > Does anyone know with certainty how (or even if) order is evaluated when
>> > updates are performed by batch?
>> >
>> > Our application internally buffers solr documents for speed of ingest
>> > before
>> > sending them to the server in chunks. The XML documents sent to the solr
>> > server contain all documents in the order they arrived without any
>> settings
>> > changed from the defaults (so overwrite = true). We are careful to avoid
>> > things like HashMaps on our side since they'd lose the order, but I
>> can't
>> > be
>> > certain what occurs inside Solr.
>> >
>> > Sometimes if an object has been indexed twice for various reasons it
>> could
>> > appear twice in the buffer but the most up-to-date version is always
>> last.
>> > I
>> > have however observed instances where the first copy of the document is
>> > indexed and differences in the second copy are missing. Does this sound
>> > likely? And if so are there any obvious settings I can play with to get
>> the
>> > behavior I desire?
>> >
>> > I looked at:
>> > http://wiki.apache.org/solr/UpdateXmlMessages
>> >
>> > but there is no mention of order, just the overwrite flag (which I'm
>> unsure
>> > how it is applied internally to an update message) and the deprecated
>> > duplicates flag (which I have no idea about).
>> >
>> > Would switching to SolrInputDocuments on a CommonsHttpSolrServer help?
>> as
>> > per http://wiki.apache.org/solr/Solrj. This is no mention of order
>> there
>> > either however.
>> >
>> > Thanks to anyone who took the time to read this.
>> >
>> > Ta,
>> > Greg
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message