lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabriel Cooper <gabriel.coo...@jtv.com>
Subject Patch submission for DataImportHandler's FileListEntityProcessor to sort files
Date Mon, 24 Oct 2011 19:49:35 GMT
Hello,

I noticed what appears to be a bug in DataImportHandler's 
FileListEntityProcessor. Specifically, it relies on Java's File.list() 
method to retrieve a list of files from the configured dataimport 
directory, but list() does not guarantee a sort order. This means that 
if you have two files that update the same record, the results are 
non-deterministic. Typically, list() does in fact return them 
lexigraphically sorted, but this is not guaranteed.

An example of how you can get into trouble is to imagine the following:

xyz.xml -- Created one hour ago. Contains updates to records "Foo" and 
"Bar".
abc.xml -- Created one minute ago. Contains updates to records "Bar" and 
"Baz".

In this case, the newest file, in abc.xml, would (likely, but not 
guaranteed) be run first, updating the "Bar" and "Baz" records. Next, 
the older file, xyz.xml, would update "Foo" and overwrite "Bar" with 
outdated changes.

The "HowToContribute" wiki page suggested I send my request here before 
opening an actual bug ticket, so please let me know if there's anything 
else I can or should do to get this patch submitted and approved. I've 
attached a patch of FileListEntityProcessor, along with an updated test, 
please let me know if it's kosher.

Thank you,

Gabriel.

Mime
View raw message