lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ard Schrijvers" <>
Subject RE: Inrease the performance of Indexing in Lucene
Date Thu, 19 Jul 2007 09:27:18 GMT

Did take a look at nutch or hadoop or solr? They partially seem to address the things you
describe...About the LSI I am not sure what has been done in those projects

Regards Ard

> Hi, Please help me.
> Its been a month since i am trying lucene.
> My requirements are huge, i have to index and search in TB of data. 
> I have question regarding three topics:
> 1. Problem in Indexing
>   As i need to index TB of data, so by googling and visiting 
> different forum
> i deployed following fashion for indexing:
> 1. First i created an array of RAMDirectory then i added the 
> documents on
> it. 
>    After crossing certain threshold i dumped it into my drive 
> as tempIndex1
> 2. I repeated the same process until all documents are 
> indexed in my drive
> as tempindex1. tempindex2...
> 3. Then finally i loaded the temp directories and merged in 
> as a main full
> indexed directories. 
> 4. I have used threading too for this purpose.
> 5. This some what removed the optimize() overhead of 
> IndexWriter, as i added
> directories together only at the end.
> Am i doing this the right way or not, is there any other 
> solution to boost
> the indexing process. 
> 2. Problem in searching
> As lucene doesnt support LSI and SVD so as to achieve 
> conceptual search, i
> first search the lucene index for the user inputted text then 
> retrieved the
> document and then expanded the query using LSI and SVD and 
> then re-searched
> the index. 
> Now with few words in query doesnt seem to have performance 
> problem but when
> i expand the query i.e. when the query contains ten words 
> ORed together then
> it takes tremendously unacceptable amount of time to get Hits. Is this
> obvious or am i missing something here too.. 
> What are the ways to achieve boost in query performance when the query
> contains many terms and especially they are ORed, as for 
> ANDed query it
> requires less time to produce Hits.
> I have used single Indexsearcher and my index is optimized as well..
> 3. Another Problem
> As i require to dump my database table too inside lucene 
> along with fulltext
> info. 
> What effect will it have on indexing and searching.?
> Also i might need to change the name of Field of Document 
> indexed in lucene,
> will it be possible.?
> I know its not possible to change the value of the field but 
> will it be
> possible to change the name of the field or we have to 
> control externally..? 
> Please shade me some light in these things:
> Your help is highly anticipated
> -- 
> View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message