lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Schultz <>
Subject Re: New Site Live Using Lucene
Date Sun, 07 Aug 2005 23:25:40 GMT
Yup, the C/C++ code is parsed using some templates I wrote utilizing 
It would be possible to do the same thing to any other language such as 
Java or PHP or Perl.
Although you'd need an expert understanding of that language's syntax in 
order to successfully parse it correctly :)

Initially Lucene was never part of the site.
I was using MySQL to store the data, and used MySQL's FULLTEXT searching.
However once I reached 25 million+ rows in a single table, MySQL's 
FULLTEXT searching ground to a halt.
After speaking with the MySQL folks, they told me to use Lucene as their 
FULLTEXT support doesn't scale well and Lucene is supposed to be one of 
the best engines around for that.

Since I was already several months into the project with the vast 
majority of the website written to use the MySQL database, converting 
entirely over to Lucene would have meant a complete code re-write.

I didn't want to do that so I combined both MySQL and Lucene and used both.

It took over 5 FULL MONTHS of 24/7 100% CPU time to PARSE the C/C++ code 
and insert it into the database.
And I only did 3,200 of the more than 25,000 projects I still need to parse.

In hindsight I might have chosen to house everything in Lucene, however 
it would be a major re-write at this point and I'm happy enough right 
now with my 'merged' approach of PHP, MySQL and Lucene.

Chris Lu wrote:
> This is cool!
> Seems you parsed the C/C++ code. Is this easy to extend to other
> languages, like Java?
> And you choose to display the data stored in database, any reason for
> that compared to reading it from Lucene index itself?
> I feel using Lucene's highlighter may make it easier to read the search results.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message