forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Turner <je...@apache.org>
Subject Re: more meta thoughts
Date Tue, 12 Aug 2003 12:40:15 GMT
On Tue, Aug 12, 2003 at 11:20:15AM +0100, g4 wrote:
> 
> On Monday, Aug 11, 2003, at 17:38 Europe/London, Juan Jose Pablos wrote:
> 
> >Jason,
> >I had a look to this software, I was not able to see the demo.
> 
> If you go to the "try it"  page  you should see it.
> 
> >
> >I have read the white paper. This sofware process documents and 
> >created the meta information.
> >
> >What exactly do you want to incorporate into forrest even on the 
> >simplest level?
> 
> Sorry I don't know why I phrased it that way.
> 
> >
> >Adding something like this is trivial:
> 
> Yes I have this already working, although I have to write a dc2html 
> XSL. I don't know if you seen my other post regarding metadata, that 
> has examples of what I've been working on.
> 
> The only thing that's missing from this is keyword generation.  I have 
> written a gawk script that's doing a regexp strip of xml and then a 
> frequency count of words. Next I need to set up a list of common words 
> to further filter potential key words. I was thinking this could be 
> used at say ./forrest webapp to generate keywords  from each page.

FTR, that's what Klarity does -- some kind of clever semantic analysis of
the site's contents, to generate a list of keywords for each page.  Then
the site can be indexed using something like isite[1], and search results
will be much more relevant.


--Jeff

[1] http://www.cnidr.org/isite.html

> I'm doing this part really for my own curiosity. If you think it could 
> be useful to generate keywords from content text, I'm working on it ;)
> 
> 

Mime
View raw message