httpd-docs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From chris <>
Subject Using Solr to index and search the Apache HTTPD Documents
Date Mon, 08 Oct 2007 04:09:05 GMT

I have been messing with Solr and the Apache HTTPD documents over the 
past few months and may have finally produced something that might be 
of  use.  You can view the not-entirely-ripe fruits-of-my-labor here:    

The work is done with a perl script that runs the documents through a 
xslt and then pushes the transformed xml into Lucene via Solr.  I have 
done a bit of tuning in Solr to get the results decent, but much more 
work is needed to get things perfect.  I ended up breaking each HTTPD 
document into many smaller documents.  Each directive, or section became 
its own Solr sub-document and is linked back to the main document via 
the common portion of the URL.

Right now the simple web search only returns the URL and a Description 
or Title depending on the type of result.  Much more could be returned 
though as I tried to match elements 1:1 from the httpd docs to the Solr 
formated documents.  The potential is there to do things like a Context 
only search, or to just search all of the Examples.  It is also very 
easy to return whatever matching elements you wish (context, examples, 
usage, summary, notes, etc...) from the Solr Documents that match your 

The results are fed through another xslt using Solr's built in response 
writer to generate the xhtml that makes up the results page.

If anyone is interested in the details/scripts/Solr schema and config 
files I used to get this far, let me know and I will make them available 
somewhere.  Just be nice when critiquing, I could barely spell XSLT when 
I started this project and I still get it wrong now and then.

If you guys see any value in this I will be happy to keep plugging away 
at it.  It has been a great learning experience so far.  I am at the 
point though where I need some direction/guidance/testers to continue.


chris rhodes

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message