lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Burton <bur...@newsmonster.org>
Subject Internal full content store within Lucene
Date Tue, 18 May 2004 18:43:39 GMT
Per the discussion the other day about storing content external to 
Lucene I think we have an opportunity to improve the lucene core and 
bring a lot of functionality to future developers.

Right now Lucene allows you to have a 'stored' field which keeps the 
content with a segment along with your inverted index.

While this is flexible for small indexes in production environments it 
falls down because index merges take FOREVER.

A thread the other day opened up and suggesting storing just a pointer 
to a file on the filesystem.  This got me thinking about a long term 
mechanism I wanted for our cluster where we store content outside of the 
index in a high performance flat-file database.

The Lucene index would only maintain FILENO-:OFFSET:LENGTH info within 
the index and this would allow us to point to our flat file database. 

This would allow Lucene index merges to be FAST, support native field 
storage, and allow the filesystem optimize contiguous blocks for the 
flat content store.  Everyone wins.

This is what the Internet archive uses:

http://www.archive.org/web/researcher/ArcFileFormat.php

I propose that Lucene support a new form of stored field that allows 
external storage engine to keep the content in a flat text store.

How much interest is there for this?  I have to do this for work and 
will certainly take the extra effort into making this a standard Lucene 
feature. 

I can come up with a requirements doc and a more formal proposal in 
another email if I get enough +1s...

Kevin

-- 

Please reply using PGP.

    http://peerfear.org/pubkey.asc    
    
    NewsMonster - http://www.newsmonster.org/
    
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
       AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
  IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message