From Peter Karman <>
Subject Re: [lucy-dev] Generalize Tutorial for multiple host languages
Date Sun, 07 Nov 2010 19:16:02 GMT
Marvin Humphrey wrote on 11/7/10 10:06 AM:

> One thing I'm realizing is that I really don't want to contribute or maintain
> C sample code which operates in a web context.  C is too prone to security
> vulnerabilities, its string handling sucks so you need waaaaay more code, and
> things like URI escaping and HTML tag stripping aren't offered by the standard
> library and aren't easy to fake up.  It's the wrong language for a quickie CGI
> app.


> I think it makes more sense for the C tutorial to operate in a command-line
> context, even if the tutorials for other host language bindings target the
> web.  But then we have a problem: the current HTML format of our sample corpus
> isn't suitable.  The solution, I think, is to change all those docs to plain
> text, with the title on the first line: 
>     Amendment XIII 
>     1. Neither slavery nor involuntary servitude, except as a punishment for
>     crime whereof the party shall have been duly convicted, shall exist within
>     the United States, or any place subject to their jurisdiction.
>     2. Congress shall have power to enforce this article by appropriate
>     legislation.
> Plain text will work for either web or command-line context, and as a bonus,
> for web-context tutorials we no longer have to either pull in an HTML parsing
> dependency or do something hackish with regexes.


For what it's worth, my intention, once we have a working C API, is to include
as part of libswish3 a "swish_lucy.c" example of using Lucy with libswish3,
which *does* do all the HTML/XML parsing.

See, for example:

Peter Karman

