lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Romain de Wolff <>
Subject Lucene & Zend Lucene Search : indexation speed, document parsing
Date Tue, 16 Sep 2008 09:28:09 GMT
Hi all,

Well this is my first post on this list. Nice to meet you all.

Im currently putting in place a system which index data with Apache  
Lucene (indexing doc, xls, pdf, source code, zip files, ...) and allow  
searching with the Zend Lucene Search library (PHP). Im planning to  
create a front-end view with Adobe Flex as well. Will see if I have  
enough time - needs to be finished by middle of December 2008. Im  
doing this as my end of studies work for the HEIG-VD ( 

Had some troubles first to make this work. Finnaly the first tests are  
okay. Using Apache Lucene 2.1.0 and Zend Framework 1.6.0.

Im asking myself a few questions. Mainly about speed (indexation time)  
and document parsing (way to index most of commonly used office  
documents).  For document parsing, I'm planning to use different open  
sources library. The company Im doing this for will be indexing a few  
Gigabytes of data. Around 5Gb I think. Any advices about this project?  
Comments and suggestion are welcome.

I'll be glad to provide you more information if you want, and am  
always happy to chat about technologies and possibilites.

@Eric Bowman : What about indexing time to create this 15Gb index? And  
how much data does it represent?

Best regards,

Romain de Wolff

Le 16 sept. 08 à 11:06, Eric Bowman a écrit :

> Hi all,
> We stuck a 60 GB OCZ "Core Series" SSD in a Dell T5400 (dual  
> quadcore, 16GB RAM, SATA II 7200 RPM disk) and did some comparisons  
> between running with our index on disk, vs. on SSD.  I can't really  
> talk about what the app does, but I can share the difference in  
> performance; see enclosed PDF.
> We have a 15GB index and a 20GB bdb, both of which are on the SSD.   
> Pretty amazing performance difference.  "Go buy one now." :)
> (The x-axis is ms/request, the y-axis is percentile.   So, "65% of  
> SSD requests took 120ms or less", for example).
> cheers,
> Eric
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message