lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <rich...@ecos.de>
Subject AW: [lucy-user] Perfomance issue
Date Sat, 08 Aug 2015 14:48:17 GMT
Hi Nick,

thanks for your feedback.

> > I have two questions:
> >
> > 1.) As far as I see I have to commit and recreate the indexer every time
> when I have made changes, otherwise the changes will not be seen by the
> other processes (or even the process itself). On the other side, I have to
> destroy and recreate the SearchIndexer to see the new documents in the
> index.
> 
> It's enough to call "commit" on an Indexer to make changes visible. You
> shouldn't recreate an Indexer immediately after comitting because every
> new Indexer holds a write lock on the index until it's committed. Just create a
> Indexer before adding documents, then call commit and destroy it.

Yes, I didn't gone to much into details in my first mail, but this is exactly how I handle
it.

Since there might be a lot of changes, the question is has the write lock any impact on searching
the index?

> 
> Since you mention "destroy" explicitly, are you using the C bindings or the
> Perl bindings?

I use Perl

> 
> > While searching itself takes only 10-30 ms. The process of destroy/commit
> and recreate takes up to 400ms. This makes things slow.
> 
> How many documents do you add in an indexing run? 

About 1-8

> Adding only a few
> small documents should typically be faster than 400ms, but sometimes, it can
> take longer if some larger segments have to be merged. 

At the moment it takes constantly about 800ms...

> See the
> FastUpdates guide in the Lucy cookbook for how to make updates
> consistently fast:
> 
>      https://metacpan.org/pod/Lucy::Docs::Cookbook::FastUpdates

I have tried this, but this causes things to behave very bad. Instead of speeding things up,
indexing and search gets very slow.

> 
> > 2.) From time to time I have to restart the process that heavily uses the
> SearchIndexer. Searching gets very slow (up 10-60 seconds, instead of
> milliseconds). Simply restarting the process fixes this, so it's not an issue on
> how the index is organized on disk. Any idea how to track down this?
> 
> First, I'd try to find out which call into Lucy takes so long,

As far as I can see it is the loop that reads the result with $hits->next

> whether the process
> is consuming CPU the whole time, and how the overall memory behavior of
> the process looks like. If the process is hanging for multiple seconds, you
> could also try to attach a debugger to the running process and see where it
> hangs.
> 

It's a production system (it not happens in the test system) and I do a restart every night,
so user don't run into this problems, but from time to time it still happens.

When I see it the next time, I will try to investigate deeper what's going on. I there some
kind of logging I can turn on to see what Lucy is doing in such a case?

> Does this process only use IndexSearcher or does it also use Indexer? If
> there's an uncommitted Indexer, it might be a locking issue. But you'd
> probably get a lock timeout error in this case.
> 

I have one process that is only using Indexer and multiple other processes that are only using
IndexSearcher.

Just to summarize, there are two (different) issues:

1.) The normal behavior: There are many changes in the index after every few changes I want
to become these changes visible in other processes. At the moment I commit in the process
that runs the Indexer and need to destroy and recreate a new IndexSearcherr to see the changes.
This all over process takes even under good conditions 200-400ms. This is decreases the performance
of the whole application. So the question is, is there some way to faster see the changes
in other processes?

2.) The second issue is, that from time to time the search time increases drastically, so
it goes up to serveral seconds and more

Regards

Gerald

P.S. The system is a vm with 16GB ram, 6 cores and is running on a SSD and most times 98%
idle, so it should not be a performance issues of the host itself.


Mime
View raw message