lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Haxby <>
Subject Re: What is the best file system for Lucene?
Date Tue, 30 Nov 2004 12:57:08 GMT

Sanyi wrote:

>I'm testing Lucene 1.4.2 on two very different configs, but with the same index.
>I'm very surprised by the results: Both systems are searching at about the same speed,
but I'd
>expect (and I really need) to run Lucene a lot faster on my stronger config.
>Config #1 (a notebook):
>WinXP Pro, NTFS, 1.8GHz Pentium-M, 768Megs memory, 7200RPM winchester
>Config #2 (a desktop PC):
>SuSE 9.1 Pro, resiefs, 3.0GHZ P4 HT (virtually two 3.0GHz P4s), 3GByte RAM, 15000RPM U320
>You can see that the hardware of #2 is at least twice better/faster than #1.
>I'm searching the reason and the solution to take advantage of the better hardware compared
to the
>poor notebook.
>Currently #2 can't amazingly outperform the notebook (#1).
How large is the index?   If it's less than a couple of GByte then it 
will be entirely in memory after you've done a few searches on the Linux 
box.  You can force it into memory by cat'ing all the index files on to 
/dev/null a couple of times (cat * > /dev/null).   A 3GHz system should 
now perform dramatically faster than a 1.5GHz system no matter what the 
file system. (And it's still 3GHz whether or not hyperthreading is 
turned on -- hyperthreading simply makes use of some under-used silicon 
to give you somewhere between 1 and 2 CPUs.  In some pathlogical cases 
it can give you less than one CPU, but I don't think lucene falls into 
the category.  And it's going to be a helluva lot faster than any 
Pentium M because it has a nice healthy cache.)

However, I don't believe that the hardware, OS or file system have 
anything to do with it.   Normally if you're seeing similar performance 
on widely differing platforms you're seeing latency somewhere else.   
For example (and this is only an example) looking up a hostname in the 
DNS will take about the same time on almost any machine you can get hold of.

You don't say how you're measuring search performance and you don't say 
what you're seeing.   Also, what's the load on the system while you're 
running the tests?   gkrellm on Linux is very useful as an overall view 
-- are you CPU bound, are you seeing lots of disk traffic?   Is the 
system actually more-or-less idle?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message