lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Les Hughes <>
Subject RE: idea: lucene doclet for indexing javadoc better
Date Thu, 14 Mar 2002 14:39:54 GMT

I have an app running on my box that does exactly this. Besides a bog
standard jsp UI, it also has a funky IE toolbar (like the google bar) to
perform the searches, plus it serves up the java source if you click thru
the results page. 

The indexer is indeed run as a doclet via an Ant script. The index is then
packaged up into a WAR and deployed to Tomcat 4. A WARdirectory explodes the
index into either RAM or FS depending on the deployment descriptor since not
all appservers expand WARs (WebLogic for one)

I'm indexing class and method names, modifiers (public, abstract etc),
parameters, imports and some other bits as well as free text of the source

Cool eh?  Since I'm not much of a COM programmer, the IE bar is taking a bit
longer than I wanted (I've also lost my MSDN library CD which doesn't help
:-(  but if anyone's interested in how things are at the moment, let me
know. Once I've put some polish on it, I was going to perhaps try to write a
JavaWorld article and of course donate the code. But for now - it works for
me :-)

I could do with a hand writing a decent QueryParser (JavaCC is not something
I want to dig into) as the standard one has it's limitations esp when you
want to search for arrays (as in params:String[])

Hope this helps,


-----Original Message-----
From: Spencer, Dave
Sent: 13/03/02 02:28
Subject: idea: lucene doclet for indexing javadoc better

One hassle/problem is that if a search engine (say...Lucene...)
is indexing javadoc (html generated from *.java),
it has to wade thru all kinds of junk to get at what's interesting.
And if you try to summarize the document by taking the
1st "n" words (after ignoring tags) you get something like
"Overview Package Class Use Deprecated Index PREV CLASS NEXT CLASS

I've done a proof of concept of using the javadoc doclet api and having
an indexer keyed off of that to create a javadoc index, instead of 
spidering the output.
It's very prelim.
I was just wondering if this has been done before, or been discussed

I guess the general principle is that it's always better to index the
src of info and not the generated html. This is why lucene is much nicer
other engines (say, htdig), as the other engines seem to only be able to

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message