lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Lucene and Struts
Date Fri, 12 Sep 2003 20:43:44 GMT

On Friday, September 12, 2003, at 03:33  PM, Jeff Linwood wrote:
> Crawling's a good solution because it's so easy to map the content 
> back to a
> URL.

:)

I smile because my blog is powered by Lucene, and I'm very particular 
about URL's and how it maps to content.  For example:

	http://www.blogscene.org/erik/Computers/Articles/javanet_jul03.html

This is a dynamically generated (despite the .html extension).  I have 
a servlet mapped to /erik/*.  My blog entries are literally on the file 
system as .txt files (blosxom-style, first line is title, rest is 
body).  The Ant <index> task indexes them based on the relative path 
from the blog root, and each directory is a category, each text file a 
blog.  You'd find this blog by searching like this:

	http://www.blogscene.org/erik?q=%22lucene+intro%22

Tacking on ?flav=rss will give you an RSS feed of a particular query!  
You could also see it as text here:

	http://www.blogscene.org/erik/Computers/Articles/javanet_jul03.txt

(the extension is used to determine the "flavor" presented - text 
format is generated using lynx).

All blog entries in the /Computers/Articles category as an RSS feed:

	http://www.blogscene.org/erik/Computers/Articles?flav=rss

All of the above requests are dynamically using a Lucene PrefixQuery 
based on the URI, and if you use the search box it AND's that with a 
QueryParser parsed query allowing queries restricted to a category and 
below.  Selecting a blog by title is like this:

	http://www.blogscene.org/erik?q=title:intro

> Of course, this all depends on what is content in your system, like you
> said. The advantage of crawling is that anything on the web page ends 
> up in
> the search engine.  That's also one of the disadvantages.

I agree that crawling can get the job done, but I'd prefer to integrate 
Lucene at a lower-level with more metadata than just a URL and HTML.  
Lucene *is* the content repository in my blog.

	Erik


Mime
View raw message