lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Lalevée <nicolas.lale...@anyware-tech.com>
Subject Re: Question about basic indexing performance improvements
Date Sun, 18 Feb 2007 10:35:19 GMT
Le dimanche 18 février 2007 04:38, Mike O'Leary a écrit :
> I am taking a class in which the professor has assigned a project to take a
> question answering application that was submitted by a team of students to
> one of the TREC contests last year and turn it into a teaching tool. One
> thing he wants to have done is add the capability for students to create a
> variety of indexes with different settings in order to observe the ways in
> which selecting a different index can cause the results to vary. The
> application searches over a specified set of just over a million
> XML-formatted documents that doesn't change, so there are no requirements
> at this point for adding and deleting documents. Because the team that
> created the application last year only needed to index it once (after they
> figured out what parameters they wanted), they didn't need to care very
> much that it took around 30 hours to index the documents one by one using a
> single threaded indexing program.
>
>
>
> Now we want to be able to index that same set of documents in much less
> time. I am new to Lucene, so I am just going by what I have found so far in
> the Lucene in Action book and on the internet. The section in the book on
> indexing concurrency says that you can share an IndexWriter object among
> several threads and that the calls from these threads will be properly
> synchronized. Will this in itself improve indexing performance very much?
> It seems like the synchronization that is needed for keeping the index from
> being corrupted would limit how much you gain from using several threads.
> In any case, my overall question is, given an indexing task of this kind,
> where you don't have to worry about additions, deletions and updates of the
> documents being indexed, just indexing the whole document database as a
> batch each time a user wants to index it in a different way, what would be
> the fastest way to do it using the various Lucene indexing tools and
> features? Thanks.

In Lucene, only one writer is allowed per index. So if you want more writers, 
just use sevral different indexes, which will contain the different parts of 
your data. When everything is indexed, you can then merge every indexes in 
one using the function IndexWriter#addIndexes(Directory[] dirs).

cheers,
Nicolas



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message