lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Acedemic Question About Indexing
Date Wed, 10 Nov 2004 19:08:27 GMT
Uh, I hate to market it, but.... it's in the book.  But you don't have
to wait for it, as there already is a Lucene demo that does what you
described.  I am not sure if the demo always recreates the index or
whether it deletes and re-adds only the new and modified files, but if
it's the former, you would only need to modify the demo a little bit to
check the timestamps of File objects and compare them to those stored
in the index (if they are being stored - if not, you should add a field
to hold that data)

Otis

--- Luke Shannon <lshannon@hypermedia.com> wrote:

> I am working on debugging an existing Lucene implementation.
> 
> Before I started, I built a demo to understand Lucene. In my demo I
> indexed
> the entire content hierarhcy all at once, and than optimize this
> index and
> used it for queries. It was time consuming but very simply.
> 
> The code I am currently trying to fix indexes the content hierarchy
> by
> folder creating a seperate index for each one. Thus it ends up with a
> bunch
> of indexes. I still don't understand how this works (I am assuming
> they get
> merged someone that I have tracked down yet) but I have noticed it
> doesn't
> always index the right folder. This results in the users reporting
> "inconsistant" behavior in searching after they make a change to a
> document.
> To keep things simiple I would like to remove all the logic that
> figures out
> which folder to index and just do them all (usually less than 1000
> files) so
> I end up with one index.
> 
> Would indexing time be the only area I would be losing out in, or is
> there
> something more to the approach of creating multiple indexes and
> merging
> them.
> 
> What is a good approach I can take to indexing a content hierarchy
> composed
> primarily of pdf, xsl, doc and xml where any of these documents can
> be
> changed several times a day?
> 
> Thanks,
> 
> Luke
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message