openoffice-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Weir <>
Subject Re: Hunspell dictionaries are not just words lists (+ other matters)
Date Tue, 08 Nov 2011 00:58:58 GMT
On Mon, Nov 7, 2011 at 7:29 PM, Christian Lohmaier <> wrote:
> Hi Rob,
> On Tue, Nov 8, 2011 at 12:34 AM, Rob Weir <> wrote:
>> The complexity of the language is irrelevant.  The point is that the
>> complexity is not created or invented by the person who compiles the
>> dictionary.  The complexity is not the creative expression of an
>> author.  The compiler of a spell checking dictionary is just recording
>> facts about the language.
> This is complete <censored/>.
> Why don't you just admit that you have absolutely no clue about how
> the dictionaries (or for that matter hunspell/affix compression as a
> whole) works?

Actually, I know quite a lot about spell checking and dictionaries.
And copyright.  How a work is created is totally irrelevant. A
painting is not copyrightable because of what paint the artist uses or
how they hold the brush.

Spell checking dictionaries are just compilations of facts that are
constrained by the preexisting external facts of the language.  The
compiler of the dictionary does not create these facts.  He  merely
encodes them.  The particular dictionary might be copyrightable as a
specific selection, coordination and arrangement of these facts, but
fair use would allow me to extract the  same facts from the
dictionaries, via reverse engineering, and make my own selection,
coordination and arrangement of these same facts and distribute them
as my own dictionary.  In other words, you might be able to protect
the compilation of facts, but you cannot protect the underlying facts,
or prevent people from copying your encoding of these facts and
distributing a different arrangement of them.  Copyright protection on
a compilation of facts is extremely thin.  It is that simple.

Now Apache might decide to honor the dictionary compilers wishes
despite the above, but that is by Apache policy, not because of any

This should not be hard to understand.  Free software advocates argue
all the time that software cannot be patented because it is "just
math".  Books have been written about it.  So why is it so hard to
understand that linguistic facts cannot be copyrighted?  And why
should you be offended by me pointing out that your work as a
dictionary compiler enlarges an intellectual commons? That should not
be something you should be offended by.  Do you think your work is
only valuable if you can draw a box around it and enforce exclusionary

In any case, these are important concepts to understand.  If it is not
clearer, after reading this response, then try doing a Google query on
terms such as copyright, compilation and facts.



View raw message