lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin A. Burton" <>
Subject Re: Real time indexing and distribution to lucene on separate boxes (long)
Date Thu, 11 Mar 2004 19:24:16 GMT
Otis Gospodnetic wrote:

>I like option 3.  I've done it before, and it worked well.  I dealt
>with very small indices, though, and if your indices are several tens
>or hundred gigs, this may be hard for you.
>Option 4: search can be performed on an index that is being modified
>(update, delete, insert, optimize).  You'd just have to make sure not
>to recreate new IndexSearcher too frequently, if your index is being
>modified often.  Just change it every X index modification or every X
>minutes, and you'll be fine.
Right now I'm thinking about #4... Disk may be cheap but a fast RAID 10 
array with 100G twice isn't THAT cheap... That's the worse case scenario 
of course and most modern search clusters use cheap hardware though...

Also... since the new indexes are SO small (~100M) the merges would 
probably be easier on the machine than just doing a whole new write.  Of 
course it's hard to make that argument with a 100G RAID array but we're 
using rysnc to avoid distribution of network IO so the CPU computation 
and network read would slow things down.

The only way around this is the re-upload the whole 100G index but even 
over gigabit ethernet this will take 15 minutes.  This doesn't scale as 
we add more searchers.

Thanks for the feedback!  I think now tha tI know that optmize is safe 
as long as I don't create a new reader... I'll be fine.  I do have think 
about how I'm going to handle search result nav.



Please reply using PGP.    
    NewsMonster -
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
       AIM/YIM - sfburtonator,  Web -
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
  IRC - #infoanarchy | #p2p-hackers | #newsmonster

View raw message