lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerald Richter <rich...@ecos.de>
Subject Re: [lucy-user] Strange results when documents gets delete while iterating
Date Thu, 26 Nov 2015 05:38:38 GMT
Thanks for the detailed explanation. Yes, I am using Coro, but in this 
special test case only one Coro thread was running.

After restarting all processes the issue has gone away. I still did not 
really understand what was going on, but since the restart (a few days 
ago) everything works like expected

Regards

Gerald


Am 19.11.2015 um 16:03 schrieb Marvin Humphrey:
> On Thu, Nov 19, 2015 at 4:39 AM, Gerald Richter - ECOS Technology
> <Gerald.Richter@ecos.de> wrote:
>> Hi,
>>
>> It's a local IndexSearcher.
>>
>> I have done a lot of tests and it's really happening.
>>
>> Let me give you a little more details, maybe this helps:
>>
>> - I call a function that creates a new IndexSearcher and call $hits = $searcher ->
hits.
>> - I iterate over the first few entries and returns the entries and the $hits
>> - The documents that were found are deleted from a database, which in turn deletes
the documents from the Lucy index.
>> - Now I iterate over the next few entries and delete them and so on
>>
>> I have made small test where per iteration only two entries are fetch. The result
looks like this:
>>
>>        id  => "8b8bce64e69b52ed244671009c11ee0e",
>>        id  => "8b8bce64e69b52ed244671009c4857e7",
>>        id  => "4a3dcd6c2e9e3074d2d52b8e72584b68",
>>        id  => "8b8bce64e69b52ed244671009c730dc9",
>>        id  => "4a3dcd6c2e9e3074d2d52b8e72584d19",
>>        id  => "8b8bce64e69b52ed244671009c7e3974",
>>        id  => "4a3dcd6c2e9e3074d2d52b8e72585475",
>>        id  => "8b8bce64e69b52ed244671009c7e4788",
>>        id  => "4a3dcd6c2e9e3074d2d52b8e72585dc2",
>>        id  => "8b8bce64e69b52ed244671009c7e2fa6",
>>
>> id is some value I store in the document. The result should only contain ids starting
with 8.
>>
>> So you see the first two are correct, after deletion of this two (always in a different
process), the next time, the first one I get is wrong the second one is correct...
>>
>> If I do not delete anything I only get the right entries (just commented out one
line the rest is still the same).
>>
>> Any clue?
> When documents in an old segment are marked as deleted, that information is
> written to a bitmap deletions file which is written to a new segment.  Old
> readers are not supposed to know about new segments.  So for something to go
> wrong, either 1) information in an old segment would have to be corrupted, 2)
> a reader would have to somehow find out about information in a new segment, or
> 3) somthing else unrelated.
>
> Indexers write index data (including new deletions data referencing documents
> in old segments) to temp files in a new segment, which are then consolidated
> into a single per-segment "compound file" named "cf.dat".  When a reader
> opens, it mmaps cf.dat for each segment in the snapshot.  Once the reader
> successfully opens all the files it needs, it never goes looking for new
> files.
>
> It's hard to imagine a mechanism that would either cause an existing "cf.dat"
> file to be modified, or persuade a reader to go look at a new "cf.dat"
> file.  So unless my reasoning is wrong, the cause is #3 -- something else
> unrelated.  I really have no idea what that could be, though since you've
> previously asked some questions about Coro/AnyEvent and other concurrency
> stuff the most likely prospect would seem to be something unique to your
> setup.
>
> The next step is probably to take the behavior you've been able to reproduce
> and isolate it in a test case that others can run and analyze.
>
> Marvin Humphrey
>
> !DSPAM:416,564de4eb23791822212463!


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message