lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <>
Subject Re: Real time indexing and distribution to lucene on separate boxes (long)
Date Fri, 12 Mar 2004 12:26:08 GMT

--- "Kevin A. Burton" <> wrote:
> Otis Gospodnetic wrote:
> >I like option 3.  I've done it before, and it worked well.  I dealt
> >with very small indices, though, and if your indices are several
> tens
> >or hundred gigs, this may be hard for you.
> >
> >Option 4: search can be performed on an index that is being modified
> >(update, delete, insert, optimize).  You'd just have to make sure
> not
> >to recreate new IndexSearcher too frequently, if your index is being
> >modified often.  Just change it every X index modification or every
> X
> >minutes, and you'll be fine.
> >  
> >
> Right now I'm thinking about #4... Disk may be cheap but a fast RAID
> 10 
> array with 100G twice isn't THAT cheap... That's the worse case

Yes, but not everything needs to be on a fast RAID (you probabably are
using SCSI disks in RAID, which is what makes it expensive.  RAID
requires only a RAID controller).
You could have a Searcher machine with a set of cheap EIDE disks, and
use those as a copy target disks, which are not searched.
Once you transfer your indices there, you copy them on fast SCSI RAID

> Also... since the new indexes are SO small (~100M) the merges would 
> probably be easier on the machine than just doing a whole new write. 
> Of 
> course it's hard to make that argument with a 100G RAID array but
> we're 
> using rysnc to avoid distribution of network IO so the CPU
> computation 
> and network read would slow things down.
> The only way around this is the re-upload the whole 100G index but
> even 
> over gigabit ethernet this will take 15 minutes.  This doesn't scale
> as we add more searchers.

I wonder what happens if you try compressing the indices before copying
them over the network.
I wonder if it makes a difference whether you use compound vs.
traditional directories.
I wonder what the index size is if you use DbDirectory instead of


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message