lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martijn v Groningen <martijn.v.gronin...@gmail.com>
Subject Re: Patch submission for DataImportHandler's FileListEntityProcessor to sort files
Date Tue, 25 Oct 2011 06:31:04 GMT
Hi Gabriel,

I'm not an expert FileEntityProcessor user, but I'd expect a
consistent process order. Your code seems "kosher" to me. You use the
last modified date as order, which seems ok to me. So create a Jira
issue and attach your patch!

Martijn

On 24 October 2011 21:49, Gabriel Cooper <gabriel.cooper@jtv.com> wrote:
> Hello,
>
> I noticed what appears to be a bug in DataImportHandler's
> FileListEntityProcessor. Specifically, it relies on Java's File.list()
> method to retrieve a list of files from the configured dataimport directory,
> but list() does not guarantee a sort order. This means that if you have two
> files that update the same record, the results are non-deterministic.
> Typically, list() does in fact return them lexigraphically sorted, but this
> is not guaranteed.
>
> An example of how you can get into trouble is to imagine the following:
>
> xyz.xml -- Created one hour ago. Contains updates to records "Foo" and
> "Bar".
> abc.xml -- Created one minute ago. Contains updates to records "Bar" and
> "Baz".
>
> In this case, the newest file, in abc.xml, would (likely, but not
> guaranteed) be run first, updating the "Bar" and "Baz" records. Next, the
> older file, xyz.xml, would update "Foo" and overwrite "Bar" with outdated
> changes.
>
> The "HowToContribute" wiki page suggested I send my request here before
> opening an actual bug ticket, so please let me know if there's anything else
> I can or should do to get this patch submitted and approved. I've attached a
> patch of FileListEntityProcessor, along with an updated test, please let me
> know if it's kosher.
>
> Thank you,
>
> Gabriel.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>



-- 
Met vriendelijke groet,

Martijn van Groningen

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message