incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Karman <pe...@peknet.com>
Subject Re: [lucy-dev] Generalize Tutorial for multiple host languages
Date Sun, 07 Nov 2010 19:16:02 GMT
Marvin Humphrey wrote on 11/7/10 10:06 AM:

> One thing I'm realizing is that I really don't want to contribute or maintain
> C sample code which operates in a web context.  C is too prone to security
> vulnerabilities, its string handling sucks so you need waaaaay more code, and
> things like URI escaping and HTML tag stripping aren't offered by the standard
> library and aren't easy to fake up.  It's the wrong language for a quickie CGI
> app.

Agreed.

> 
> I think it makes more sense for the C tutorial to operate in a command-line
> context, even if the tutorials for other host language bindings target the
> web.  But then we have a problem: the current HTML format of our sample corpus
> isn't suitable.  The solution, I think, is to change all those docs to plain
> text, with the title on the first line: 
> 
>     Amendment XIII 
> 
>     1. Neither slavery nor involuntary servitude, except as a punishment for
>     crime whereof the party shall have been duly convicted, shall exist within
>     the United States, or any place subject to their jurisdiction.
> 
>     2. Congress shall have power to enforce this article by appropriate
>     legislation.
> 
> Plain text will work for either web or command-line context, and as a bonus,
> for web-context tutorials we no longer have to either pull in an HTML parsing
> dependency or do something hackish with regexes.
> 

Agreed.

For what it's worth, my intention, once we have a working C API, is to include
as part of libswish3 a "swish_lucy.c" example of using Lucy with libswish3,
which *does* do all the HTML/XML parsing.

See, for example:

 http://dev.swish-e.org/browser/libswish3/trunk/src/swish_lint.c
 http://dev.swish-e.org/browser/libswish3/trunk/src/xapian/swish_xapian.cpp

-- 
Peter Karman  .  http://peknet.com/  .  peter@peknet.com

Mime
View raw message