lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gupta, Rajiv" <Rajiv.Gu...@netapp.com>
Subject RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119
Date Wed, 04 Jan 2017 14:22:37 GMT
I think you may not have liked the approach :(

However,  I tried that and it seems working fine. I gave 20+ big runs and they all seems went
through. 

Just checking should I use raw copy or is there better way to copy indexes without losing
any transit data, such as ($indexer->add_index($index);)

Thanks,
Rajiv Gupta

-----Original Message-----
From: Gupta, Rajiv [mailto:Rajiv.Gupta@netapp.com] 
Sent: Monday, January 02, 2017 7:47 PM
To: user@lucy.apache.org
Subject: RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Till now we are under the impression of - http://lucene.472066.n3.nabble.com/lucy-user-Parallel-indexing-in-same-index-dir-using-Lucy-td4160395.html
so avoiding any kind of parallel indexing. 

Let me know your thoughts on this approach. Run all indexing in parallel and save indexes
at /tmp (local fs location) and periodically copy it to shared location. Why to copy because
from servers where I'm performing search need access to the indexes. Insertion will happen
only from one server however searches can be performed from different servers using indexed
data. 

-Rajiv

-----Original Message-----
From: Nick Wellnhofer [mailto:wellnhofer@aevum.de] 
Sent: Monday, December 19, 2016 7:09 PM
To: user@lucy.apache.org
Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

On 19/12/2016 04:21, Gupta, Rajiv wrote:
> Rajiv>>>All parallel processes are child process of one process and running
from the same host. Would you think giving host name uniqueness with some random number would
help for multiple processes.

If you access an index on a shared volume only from a single host, there's actually no need
to set a hostname at all, although it's good practice. It's all explained in Lucy::Docs::FileLocking:

     http://lucy.apache.org/docs/perl/Lucy/Docs/FileLocking.html

But you should never use different or even random `host` values on the same machine. This
can lead to stale lock files not being deleted after a crash.

> Rajiv>>> Going to local file system is not possible for my case. This is a test
framework that generate lot of logs and I'm doing indexing per test runs and all these logs
needs to be on shared volume for other triaging purpose.

It doesn't matter where the log files are. I'm talking about the location of your Lucy index
directory.

> The next thing I'm going to try is create a watcher per directory and index all files
under that directory serially. Currently I'm creating watchers on all the files and some time
multiple files in the same directory may try to get indexed at the same time.  And as you
stated this might be the issue. I'm not sure how it will perform with the current time limits.

By Lucy's design, indexing files in parallel shouldn't cause any problems, especially if it
all happens on a single machine. The worst thing that could happen are lock errors which can
be addressed by changing timeouts or retrying. But without code to reproduce the problem,
I can't tell whether it's a Lucy bug.

If you can't provide a test case, it's a good idea to test whether the problems are caused
by parallel indexing at all. I'd also try to move your indices to a local file system to see
whether it makes a difference.

> Creating Indexer manager adding overhead to the search process.

You only have to use IndexManagers for searchers to avoid errors like "Stale NFS filehandle".
If you have another way to handle such errors, there might be no need for IndexManagers at
all. Again, see Lucy::Docs:FileLocking.

Nick


Mime
View raw message