incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Re: PHP blinding for Lucy?
Date Wed, 28 Feb 2007 04:30:19 GMT

On Feb 27, 2007, at 6:24 PM, David Balmain wrote:

> How about just using doxygen. I don't have much experience with it but
> I'm pretty sure there would be a way to tag particular functions that
> are public so that when you generate the documentation you can
> generate only the public methods.

I don't know it well either, but I'm sure you're right and it will  
allow us to put in a public/non-public tag.

It would be even better if we could export at least some of the  
documentation -- particularly method descriptions.  I'd really like  
to be able to synch up the Perl binding docs by running a script  
rather than via copy-and-paste.

> Of course you could also have public and private include files.

Hmm, can you elaborate?  I'd basically given up hope that we'd be  
able to maintain tight control over symbol export, and was expecting  
to define the API via documentation only.

>> I'm thinking we need shared
>> documentation.  XML, maybe?  Then each binding would require an
>> appropriate XML-to-whatever translation utility.
> I'm not entirely sure I'm on the same wavelength as you today. By
> 'whatever' do you mean the specific languages documentation format?

Yes, that was what I was thinking.  But perhaps not quite so  
ambitious as may have come across.

> If
> that is the case then I don't see this working as the ruby API for
> Lucy will probably be quite different to the PHP API.

If we're reasonably careful about how we word things, many method  
descriptions could be reused across all bindings.  And one of the  
things about the naming convention we've settled on for method  
invocations is that you can derive either lowerCamelCase or  
separated_by_underscores method names with a simple transform:

    Sim_Length_Norm => lengthNorm
    Sim_Length_Norm => length_norm

If we tag every last thing, enough so that we could actually  
generate, say, both POD and javadoc without intervention, then sure,  
XML is wayyyy too verbose.  Anything would be, really, because  
language syntaxes are too distinct.  But if we set our sights a  
little lower, and just try to share method names, method  
descriptions, and public/non-public access control, that's doable --  
and it's a whole lot of savings.  (Maybe parameter lists and return  
values, too, but that's a little harder.)

       Computes the normalization value for a field given the total  
number of
       terms contained in a field. These values, together with field  
       are stored in an index and multipled into scores for hits on  
each field
       by the search code.

       Matches in longer fields are less precise, so implementations  
of this
       method usually return smaller values when numTokens is large,  
and larger
       values when numTokens is small.

       That these values are computed under IxWriter_Add_Document and  
       then using Sim_Encode_Norm. Thus they have limited precision, and
       documents must be re-indexed if this method is altered.

Note the use of "IxWriter_Add_Document" and "Sim_Encode_Norm" within  
the description.  Those method names are identifiable patterns,  
matchable with this regex:

   # $1 is class nick, $2 is short method name

It's easy to sub out IxWriter_Add_Document for this, which will  
generate a nicely formatted link...


Now, returning to your point about Doxygen... With XML, we'd have to  
maintain separate files for the documentation, which would suck.  So  
I'm all for using Doxygen, especially if we can rig things up so that  
the description can be isolated and parsed out reliably.

I might go write an extractor tool which parses our header files and  
generates intermediate XML.  Then bindings authors could write their  
own final translation utilities in their language of choice, and use  
as much or as little as they wish.

Hopefully they'd use more rather than less.  It's to the user's  
benefit for various bindings to present reasonably consistent APIs  
while still being idiomatic, because it makes it easier to apply what  
you learned about one of them to another.

Marvin Humphrey
Rectangular Research

View raw message