lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andi Vajda <>
Subject Re: [lucy-dev] RegexTokenizer
Date Wed, 09 Mar 2011 04:15:53 GMT

On Mar 8, 2011, at 19:56, Marvin Humphrey <> wrote:

> On Tue, Mar 08, 2011 at 12:25:49PM -0800, David E. Wheeler wrote:
>> Yeah. It just drives me nuts to see the namespacing conventions of one
>> language forced on another. 
> Like how we're sneaking namespaces into Lucy's C code via prefixes? :)  Or how
> we jammed namespaces into JavaScript via objects back in our OpenJSAN 
> days?  :)
>> Each language should have names that make sense by the conventions of that
>> language IMHO.
> I don't think it's a good idea for Lucy's class hierarchy to be organized
> differently for each host language binding.  (That seems like the logical
> extrapolation of your remark, though I believe you intended to express an
> ideal rather than make a concrete recommendation.)
> Since the class hierarchy must be shared, its design has to balance many
> competing interests and work well across the gamut of hosts.  What we have now
> doesn't violate anybody's language rules or conventions to the best of my
> knowledge.  It's internally consistent, and works OK for our C code.

With JCC I made the conscious decision a long time ago to not carry over the Java package
structure into Python modules but use a flat namespace instead, generating one Python module
for an entire class tree.

Name collisions are surprisingly rare. When they occur, the conflicts can be usually be resolved
with a --rename.

In the C++ layer, I keep the Java package structure because it's free but in the Python layer,
it's in the way.

It seems that for API entrypoints that matter, people tend to pick unique class  names anyway.


>>> If someone is willing work up a patch which makes "Lucy::Tokenizer::Regex"
>>> possible, then we can consider it.  Until then, it has to be ruled out for
>>> technical reasons.
>> Probably not too difficult.
> It's technically doable.  
> Still, Lucy's namespacing scheme and class hierarchy have been been mulled
> over very hard over a very long time.  When renaming Lucy::Analysis::Tokenizer
> to something else, we should strive to operate within the existing
> conventions.
> Upending the existing hierarchy and changing the rules would be a much larger
> undertaking.  It's not even worth contemplating without someone willing to do
> the work -- and I rather suspect that such a volunteer would become frustrated
> quickly by all the concerns I'd raise as someone who works on Lucy's C code.
>>> FWIW, "Lucy::Tokenizer::Regex" implies that we would have a Lucy::Tokenizer
>>> class, which would break another convention -- we no longer have any classes
>>> which live directly under Lucy.
>> Now that's a shame. Seems like a waste of namespace hierarchy.
> There are two or three hundred classes in Lucy, and there will likely be
> hundreds more in time.  I think we should be conservative about what we put at
> the second level of the hierarchy, so that scanning any one directory with the
> naked eye produces sensible results.
> We inherited all the dirs under Lucy except for Lucy/Plan and Lucy/Object from
> Lucene.  IMO the organization has served us pretty well.
>> But I'm very late to this discussion, so feel free to ignore my ignorant
>> harping. :-)
> I see your smiley, but I'll emphasize this anyway: we're definitely not
> ignoring your suggestion even if we don't adopt it.
> Cheers,
> Marvin Humphrey

View raw message