lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Serkan Mulayim <serkanmula...@gmail.com>
Subject Re: [lucy-user] C library, how to check index is healthy
Date Tue, 28 Feb 2017 19:17:13 GMT
Thanks guys very much for your comments. And sorry for my late response.

Nick, I have a few follow up questions regarding your comments.

So as I see:
1- when we do indexing operation in an existing index, a new segment is
created and it is not put into the index until it is committed. When it is
committed, its segment is kept separately and the snapshot.json file is
updated to include the new segment.
2- lock files are being generated and are kept separate based on the pid
(no shared FS adjustments).

>From the documentation about Indexer: "In general, only one Indexer at a
time may write to an index safely. If a write lock cannot be secured, new()
will throw an exception."

What I would like to do is, to be able to index thousands of documents in
batches with asynchronous calls to the library. Asynchronous calls will try
to update the newly created segment to be written by different calls. If
PIDs are the same, it seems like system will crash due to write.lock
containing the PIDs. Do you think there is a way to make this work with
calls from different PIDs, with an addition of commit.lock file? I hope
this makes sense :( :)

One more question is when I index documents and commit each time (let's say
5000 batches of commits in synchronous way), I see that the indexing works
fine. How are the segments being handled. I do not see that 5000 different
segments created. Is it because after a certain number of segments (say
32), the segments are being merged and optimized?

Thanks in advance.
Serkan

On Tue, Feb 14, 2017 at 7:03 AM, Nick Wellnhofer <wellnhofer@aevum.de>
wrote:

> On 13/02/2017 20:44, Serkan Mulayim wrote:
>
>> 1- How do we check that the index is healthy for SEARCHING (e.g. creating
>> a searcher) without a crash? As I see there is no problem in creating a
>> Searcher even if there is a lock (write.lock or merge.lock)
>>
>
> First of all, Lucy should never "crash" in the sense of a segfault. If it
> does, this is a bug that should be reported.
>
> Unless your index is on a shared volume like NFS, it can always be
> searched.
>
> 2- How do we check that the index is healthy for INDEXING (e.g. creating a
>> new indexer). I believe if the index is healthy(answer to the first
>> question) and there is no LOCK file (e.g. write.lock or merge.lock), then
>> we can assume that index is healthy and we can create a new indexer, right.
>> (Assuming that there is no write permission issues or no disk space issues)
>>
>
> You can always create a new Indexer. The worst that can happen is that a
> LockErr exception is thrown after the Indexer failed to acquire a lock.
> Note that by default, Indexer retries to get a lock for 1000 ms (one
> second). This can be configured with IndexManager:
>
>     https://lucy.apache.org/docs/c/Lucy/Index/IndexManager.html
>
> 3- What are the lock types? As far as I see there are only write.lock and
>> merge.lock. Are there any others?
>>
>
> This is explained in the documentation:
>
>     https://lucy.apache.org/docs/c/Lucy/Docs/FileLocking.html
>
> If we close the application calling Lucy before the indexer is destroyed,
>> is there an index recovery strategy.
>>
>
> Lucy uses an atomic rename operation when committing data so a crashing
> Indexer should never corrupt the index.
>
> What would the implications of simply deleting write.lock and merge.lock
>> be?
>>
>
> In most cases, this shouldn't be necessary. Lucy stores the PID of the
> process that created a lock and tries to clear stale lock files from
> crashed processes. But this won't work if another processes reuses the PID.
> If you're absolutely sure that a lock doesn't belong to an active Indexer,
> you can delete the lock directories manually.
>
> Side note: This could be improved by supporting locking mechanisms that
> release locks automatically if a process crashes. But these are
> OS-dependent and aren't guaranteed to work reliably over NFS:
>
> - `fcntl(F_SETLK)` or `lockf` on POSIX (unsuitable for multi-threaded
>   operation).
> - `flock` on BSD, Linux.
> - `CreateFile` with a 0 sharing mode on Windows.
>
> Nick
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message