lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ard Schrijvers" <>
Subject RE: Ocean Documentation
Date Fri, 11 Jul 2008 13:09:09 GMT
Hello Jason et al,

Indeed there are plenty of usecases of instantly needed updated
searches, for example the jsr-170 (jcr) compliant Jackrabbit
implementation: it havily relies on lucene for searching and hierarchy
resolving, and according jsr-170 spec after a save(), changes need to be
visible instantly. 

Also, I think a very similar solution to yours is implemented there: See
[1] if you like 

Regards Ard


> I started a wiki name at 
> linked 
> from  
> Perhaps I should add some background on the wiki.  I can add 
> a little bit here.  I was an early Solr developer/user at a 
> social networking company when Google's GData came out.  It 
> looked similar to Solr so I took a look at it.  The one thing 
> it had over Solr was realtime updates or the ability to add, 
> delete, or update a document and be able to see the update in 
> search results immediately.  With Solr the company had 
> decided on a 10 minute interval of updating the index with 
> delta updates from an Oracle database.  I wanted to see if it 
> was possible with Lucene to create an approximation of what 
> GData does.  The result is Ocean.
> The use case it was designed for is websites with dynamic 
> data, some of which are social networking, photo sites, 
> discussions boards, blogs, wikis, and such.  More broadly it 
> is possible to use Ocean with any application that requires 
> the database like feature of immediate updates.  Probably the 
> best example of this is all of Google's web applications, 
> outside of web search, uses a GData interface.  Meaning the 
> primary datastore is not mysql or some equivalent, it is a 
> proprietary search based database.  The best example of this 
> is Gmail.  If I receive an email through Gmail I can also 
> search on it immediately, there is no 10 minute delay.  Also 
> in Gmail I can change labels, a common example being changing 
> unread emails to read in bulk.  Presumably Gmail is not 
> reindexing the entire email for each label change.  
> Most highly trafficked web applications do not use the 
> relational facilities like joins because they are too 
> expensive.  Lucene does not offer joins so this is fine.  The 
> only area Lucene is currently weak in is range queries.  
> Mysql uses a btree index whereas Lucene uses the time 
> consuming TermEnum and TermDocs combination.  This is an area 
> Tag Index addresses.  
> The way Ocean is designed there should be no limitations to 
> using it compared to using Lucene IndexWriter.  It offers the 
> same functionality.  If one does not want to use the 
> transaction log Ocean offers because one simply wants to 
> index 1 million documents at once, Ocean offers what is a 
> called a LargeBatch.  It is a way to perform a large number 
> of updates taking advantage of the new IndexWriter speedup, 
> combined with transactional semantics.  
> Karl, does this answer your question or are there areas that 
> could use more explanation?
> On Fri, Jul 11, 2008 at 6:20 AM, Karl Wettin 
> <> wrote:
> 	10 jul 2008 kl. 22.08 skrev Jason Rutherglen:
> 		Is there a good place to put Ocean 
> documentation?  Is there a place on the wiki that is good?
> 	Hi Janson,
> 	the wiki is just fine.
> 	I've been reading the docs and looked at your patch. 
> There is a lot of text about how it does what it does, but it 
> says nothing anything about the intended use. I honestly 
> don't even know what you mean by "real time search". You will 
> probably get more attention if the documentation starts out 
> with some use cases or thoughts on when and why it might make 
> sense to use your code.
> 	      karl
> ---------------------------------------------------------------------
> 	To unsubscribe, e-mail:
> 	For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message