lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathan Kurz <>
Subject Re: [lucy-dev] Some quick benchmarks
Date Thu, 08 Dec 2011 19:04:27 GMT
On Thu, Dec 8, 2011 at 10:02 AM, Nick Wellnhofer <> wrote:
> On 08/12/2011 01:41, Marvin Humphrey wrote:
> Here is more data from a real world indexing run:
> RT+CF: 139 secs
> ST+N:  112 secs

Hi Nick --

I'm mostly listening in on this conversation because I haven't thought
much about indexing, but the magnitude of improvement here surprises
me:  I wouldn't have thought that there would be that much time to
shave off!    My presumption was that everything would be dominated by
Disk IO, and that the actual tokenizing time would be tiny.   Are
these numbers both working within memory with a pre-warmed cache so no
disk reads are involved?  Also, have you controlled for whether the
data is sync'ed to disk after the indexing?

I'm not in a position to do it, but it might be insightful to do a
quick profile of where these two are spending their time.  Are we
gaining because the algorithm is faster, or because we have less
function call overhead, or because of something confounding?  Oprofile
on Linux is very easy to use once you have it set up.  In case you
aren't familiar with it, this is a good intro:



View raw message