Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm
Date: Thu, 4 Sep 2003 16:35:41 -0400
Subject: Re: Lucene app to index Java code
Content-Type: text/plain; charset=US-ASCII; format=flowed
Mime-Version: 1.0 (Apple Message framework v552)
From: Erik Hatcher <erik@ehatchersolutions.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Content-Transfer-Encoding: 7bit
In-Reply-To: <3F5776A5.9080207@newsmonster.org>
Message-Id: <59C80DAA-DF17-11D7-9C0C-000393A564E6@ehatchersolutions.com>

On Thursday, September 4, 2003, at 01:30  PM, Kevin A. Burton wrote:
>> - XDoclet could be used to sweep through Java code and build a 
>> text/XML file as richly as you'd like from the information there 
>> (complete with JavaDoc tags, which Zapata will miss :)), and then run 
>> Lucene on the generated files.  On a related note, the XDoclet2 
>> architecture would streamline this even further by eliminating the 
>> middle textual representation (QDox/XJavadoc reads Java as a "meta 
>> data provider" and then a Lucene "plugin" indexes things).  It could 
>> be done without the intermediate text representation even in XDoclet 
>> 1.2, but it would require coding a custom subtask and be slightly out 
>> of the norm for XDoclet subtasks (but would work just fine).
>
> It would be faster to write a native doclet as this would remove the 
> XML parse overhead...  The whole point of this thing is that it needs 
> to be fast!

Do you mean the Ant build file parsing?  That would be the only XML 
parsing in the equation I'm proposing, unless you did it the clunkiest 
XDoclet 1.2 way of having an intermediate XML file.

As for speed.... QDox, I've heard, is the fastest option.  javadoc is 
the slowest parsing of the three I know of (javadoc, xjavadoc, qdox).

	Erik