lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From robert engels <>
Subject Re: Various Ideas from ApacheCon
Date Mon, 07 May 2007 22:35:47 GMT
I think the 'updating documents' issue is almost always related to  
unique document updates, where there exists some "primary unique key"  
for the document. Is this true?

If so, maybe a de-facto standard like a indexed/stored/non-tokenized  
field of OID should be used.

if so, it would be easy to add the following to IndexModifer:

removeDocument(String OID)

and that would probably simplify the life of beginning Lucene users,  
and it mimics the CRUD syntax most people are familiar with.

On May 7, 2007, at 5:25 PM, Grant Ingersoll wrote:

> Hey Gang,
> Back from ApacheCon in Amsterdam, and thought I would give a bit of  
> a report on a few things that were interesting related to Lucene.
> First off, there was a very high level of interest in Lucene and  
> Solr, which was great to see.
> In doing a training and a talk, couple of things that people seemed  
> to ask about a fair amount.
> 1. Updates and how to do them.  The whole delete/add thing just  
> never sits well with newcomers.  I want to throw out the idea of  
> implementing something like the Layers functionality in photo  
> editing tools like Photoshop (whereby the underlying image is not  
> changed, but the layer adds/deletes/masks it).  I wonder how  
> complicated it would be to mark a document as being updated and  
> then know that we have to look in an alternate place for  
> information concerning that Field/Document such as the "updates"  
> file.  I don't know the details of implementing it, but wanted to  
> see if it makes any sense at all.  Gut reaction is it would be  
> slower for searching, but how much slower not sure.  It could  
> potentially be faster for updating and could allow for per field  
> updates.  Just an idea, feel free to shoot it full of holes.  The  
> other option might be to think about whether a flexible indexing  
> implementation could be optimized for updates instead of  
> searching.  Optimization or merges could then bring the updates  
> back into the fold.
> 2. How does Lucene search compare w/ using built in DB search? Has  
> anyone done a study comparing Lucene performance/quality to the  
> likes of MySQL/Postgres/Oracle?  Related question is always on how  
> to integrate the two.
> 3.  Some questions on the use cases of ParallelReader.  So, if  
> anyone cares to contribute in that arena, please do so, since I  
> haven't used it.
> 4.  As much as we like to ignore file format issues (PDF, etc.) it  
> is one of the big questions people have about using Lucene.  Tika  
> should help in this area, but still seems to be a little way off.   
> Our website could help by giving more concrete advice on how to  
> handle different file formats and maybe even some benchmarks on  
> it.  I think we can maintain Lucene's independence from these  
> libraries while still giving advice on how handle them.  Maybe a  
> best practices section on the wiki?
> 5. Distributed Searching - Code/demonstration to do search across  
> several indexes on several machines would be useful.
> At any rate, just some random thoughts garnered from ApacheCon.   
> All in all, a good conf. w/ lots of Lucene interest.
> -Grant
> --------------------------
> Grant Ingersoll
> Center for Natural Language Processing
> Read the Lucene Java FAQ at 
> LuceneFAQ
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message