lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ard Schrijvers" <a.schrijv...@onehippo.com>
Subject RE: Ocean Documentation
Date Tue, 15 Jul 2008 10:38:39 GMT

> Jason  Rutherglen wrote:
> I took a look at Jackrabbit, which are a very cool animal, 

:-)

> and there are similar ideas in the Lucene portion.  I will 
> try to take a look at the source to get a better understanding.  

The Jackrabbit indexing code is pretty much tied to Jackrabbit and
jsr-170 though, so a large portion of the code is wrt resolving
hierarchies/xpath/sql queries. I suppose Ocean is much more generic and
reusable, though, the concept about needing instant reflection of
changes in search results is the same

-Ard

> 
> 
> On Fri, Jul 11, 2008 at 9:09 AM, Ard Schrijvers 
> <a.schrijvers@onehippo.com> wrote:
> 
> 
> 	Hello Jason et al,
> 	
> 	Indeed there are plenty of usecases of instantly needed updated
> 	searches, for example the jsr-170 (jcr) compliant Jackrabbit
> 	implementation: it havily relies on lucene for 
> searching and hierarchy
> 	resolving, and according jsr-170 spec after a save(), 
> changes need to be
> 	visible instantly.
> 	
> 	Also, I think a very similar solution to yours is 
> implemented there: See
> 	[1] if you like
> 	
> 	Regards Ard
> 	
> 	[1] http://jackrabbit.apache.org/index-readers.html
> 	
> 
> 
> 
> 	> I started a wiki name at
> 	> http://wiki.apache.org/lucene-java/OceanRealtimeSearch linked
> 	> from http://wiki.apache.org/lucene-java/LuceneResources.
> 	>
> 	> Perhaps I should add some background on the wiki.  I can add
> 	> a little bit here.  I was an early Solr developer/user at a
> 	> social networking company when Google's GData came out.  It
> 	> looked similar to Solr so I took a look at it.  The one thing
> 	> it had over Solr was realtime updates or the ability to add,
> 	> delete, or update a document and be able to see the update in
> 	> search results immediately.  With Solr the company had
> 	> decided on a 10 minute interval of updating the index with
> 	> delta updates from an Oracle database.  I wanted to see if it
> 	> was possible with Lucene to create an approximation of what
> 	> GData does.  The result is Ocean.
> 	>
> 	> The use case it was designed for is websites with dynamic
> 	> data, some of which are social networking, photo sites,
> 	> discussions boards, blogs, wikis, and such.  More broadly it
> 	> is possible to use Ocean with any application that requires
> 	> the database like feature of immediate updates.  Probably the
> 	> best example of this is all of Google's web applications,
> 	> outside of web search, uses a GData interface.  Meaning the
> 	> primary datastore is not mysql or some equivalent, it is a
> 	> proprietary search based database.  The best example of this
> 	> is Gmail.  If I receive an email through Gmail I can also
> 	> search on it immediately, there is no 10 minute delay.  Also
> 	> in Gmail I can change labels, a common example being changing
> 	> unread emails to read in bulk.  Presumably Gmail is not
> 	> reindexing the entire email for each label change.
> 	>
> 	> Most highly trafficked web applications do not use the
> 	> relational facilities like joins because they are too
> 	> expensive.  Lucene does not offer joins so this is fine.  The
> 	> only area Lucene is currently weak in is range queries.
> 	> Mysql uses a btree index whereas Lucene uses the time
> 	> consuming TermEnum and TermDocs combination.  This is an area
> 	> Tag Index addresses.
> 	>
> 	> The way Ocean is designed there should be no limitations to
> 	> using it compared to using Lucene IndexWriter.  It offers the
> 	> same functionality.  If one does not want to use the
> 	> transaction log Ocean offers because one simply wants to
> 	> index 1 million documents at once, Ocean offers what is a
> 	> called a LargeBatch.  It is a way to perform a large number
> 	> of updates taking advantage of the new IndexWriter speedup,
> 	> combined with transactional semantics.
> 	>
> 	> Karl, does this answer your question or are there areas that
> 	> could use more explanation?
> 	>
> 	>
> 	> On Fri, Jul 11, 2008 at 6:20 AM, Karl Wettin
> 	> <karl.wettin@gmail.com> wrote:
> 	>
> 	>
> 	>
> 	>       10 jul 2008 kl. 22.08 skrev Jason Rutherglen:
> 	>
> 	>
> 	>
> 	>               Is there a good place to put Ocean
> 	> https://issues.apache.org/jira/browse/LUCENE-1313
> 	> documentation?  Is there a place on the wiki that is good?
> 	>
> 	>
> 	>
> 	>       Hi Janson,
> 	>
> 	>       the wiki is just fine.
> 	>
> 	>       I've been reading the docs and looked at your patch.
> 	> There is a lot of text about how it does what it does, but it
> 	> says nothing anything about the intended use. I honestly
> 	> don't even know what you mean by "real time search". You will
> 	> probably get more attention if the documentation starts out
> 	> with some use cases or thoughts on when and why it might make
> 	> sense to use your code.
> 	>
> 	>
> 	>             karl
> 	>
> 	>
> 	> 
> ---------------------------------------------------------------------
> 	>       To unsubscribe, e-mail: 
> java-dev-unsubscribe@lucene.apache.org
> 	>       For additional commands, e-mail: 
> java-dev-help@lucene.apache.org
> 	>
> 	>
> 	>
> 	>
> 	>
> 	
> 	
> ---------------------------------------------------------------------
> 	To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> 	For additional commands, e-mail: java-dev-help@lucene.apache.org
> 	
> 	
> 
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message