lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Various Ideas from ApacheCon
Date Mon, 07 May 2007 22:25:58 GMT
Hey Gang,

Back from ApacheCon in Amsterdam, and thought I would give a bit of a  
report on a few things that were interesting related to Lucene.

First off, there was a very high level of interest in Lucene and  
Solr, which was great to see.

In doing a training and a talk, couple of things that people seemed  
to ask about a fair amount.

1. Updates and how to do them.  The whole delete/add thing just never  
sits well with newcomers.  I want to throw out the idea of  
implementing something like the Layers functionality in photo editing  
tools like Photoshop (whereby the underlying image is not changed,  
but the layer adds/deletes/masks it).  I wonder how complicated it  
would be to mark a document as being updated and then know that we  
have to look in an alternate place for information concerning that  
Field/Document such as the "updates" file.  I don't know the details  
of implementing it, but wanted to see if it makes any sense at all.   
Gut reaction is it would be slower for searching, but how much slower  
not sure.  It could potentially be faster for updating and could  
allow for per field updates.  Just an idea, feel free to shoot it  
full of holes.  The other option might be to think about whether a  
flexible indexing implementation could be optimized for updates  
instead of searching.  Optimization or merges could then bring the  
updates back into the fold.

2. How does Lucene search compare w/ using built in DB search? Has  
anyone done a study comparing Lucene performance/quality to the likes  
of MySQL/Postgres/Oracle?  Related question is always on how to  
integrate the two.

3.  Some questions on the use cases of ParallelReader.  So, if anyone  
cares to contribute in that arena, please do so, since I haven't used  

4.  As much as we like to ignore file format issues (PDF, etc.) it is  
one of the big questions people have about using Lucene.  Tika should  
help in this area, but still seems to be a little way off.  Our  
website could help by giving more concrete advice on how to handle  
different file formats and maybe even some benchmarks on it.  I think  
we can maintain Lucene's independence from these libraries while  
still giving advice on how handle them.  Maybe a best practices  
section on the wiki?

5. Distributed Searching - Code/demonstration to do search across  
several indexes on several machines would be useful.

At any rate, just some random thoughts garnered from ApacheCon.  All  
in all, a good conf. w/ lots of Lucene interest.


Grant Ingersoll
Center for Natural Language Processing

Read the Lucene Java FAQ at 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message