httpd-docs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tony Stevenson <t...@pc-tony.com>
Subject Update on Solr & Lucene search for HTTPd docs
Date Sat, 03 Nov 2007 03:54:25 GMT
Good evening, or rather morning I guess.

I have been working with Chris (#apache - arryeder) on setting up a test 
environment on httpd.zones.apache.org to allow us to use Solr & Lucene 
as the HTTPd docs search engine, with a view to possibly replacing the 
current google implementation.

We have got this working, using the following components:

Java JDK - 1.5 or higher
Nightly snaphsot of solr, currently using snapshot from Nov 2nd 2007
Perl 5.8.8
XML::Parser   (XML-Parser-2.34)
    CPAN -> XML::XPath
    CPAN -> File::Find
    CPAN -> Cwd
expat-2.0.1 (http://sourceforge.net/projects/expat/)
svn (Only the client is needed)

We now have an index of the 2.2.x documents, and these can be queried 
using fajita (the #apache bot). 
We dont have a web form ready yet. But if someone wants to help and 
contribute one, it will be gratefully received I can assure you  :-)

< pctony> fajita:  newds  mod_rewrite
< fajita> [33] (1)http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html 
(2)http://httpd.apache.org/docs/2.2/misc/rewriteguide.html#ToC1
                   
(3)http://httpd.apache.org/docs/2.2/howto/access.html#rewrite 
(4)http://httpd.apache.org/docs/2.2/rewrite/index.html
                   
(5)http://httpd.apache.org/docs/2.2/vhosts/mass.html#homepages.rewrite

This is an example query in IRC.   [33]  means there are 33 related 
results, but for the purposes of IRC we only return the top 5 results 
sorted by relevance.  With a web form this wont be neccesary.
I am currently in the process of documenting this 'docsearch' tool.  I 
already have a partial runbook for infra@

The index is built from the latest svn checkout of the docs, so it can 
be maintained much more easily.  All that is needed for the index to 
update is for the latest .xml files to be checked out or updated and the 
re-index to be run.  That's all.

The next version will index all version of the docs, with multi-language 
support hot on it's heels.




Cheers,
Tony



---------------------------------------------------------------------
To unsubscribe, e-mail: docs-unsubscribe@httpd.apache.org
For additional commands, e-mail: docs-help@httpd.apache.org


Mime
View raw message