incubator-ooo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Lohmaier <cl...@openoffice.org>
Subject Re: Hunspell dictionaries are not just words lists (+ other matters)
Date Tue, 08 Nov 2011 10:59:49 GMT
Hi Rob, *,

On Tue, Nov 8, 2011 at 1:58 AM, Rob Weir <robweir@apache.org> wrote:
> On Mon, Nov 7, 2011 at 7:29 PM, Christian Lohmaier <cloph@openoffice.org> wrote:
>> On Tue, Nov 8, 2011 at 12:34 AM, Rob Weir <robweir@apache.org> wrote:
>> [...]
>> Why don't you just admit that you have absolutely no clue about how
>> the dictionaries (or for that matter hunspell/affix compression as a
>> whole) works?
>
> Actually, I know quite a lot about spell checking and dictionaries.
> And copyright.  How a work is created is totally irrelevant. A
> painting is not copyrightable because of what paint the artist uses or
> how they hold the brush.

Yes, and  nobody claims the words that a dictionary represents or
whatever fragment of grammar it entails would be copyrightable.

> Spell checking dictionaries are just compilations of facts

As is any other software. Following that logic, you could not put a
copyright on *anything*, as the law of physics or math are the same
for everyone.

> that are
> constrained by the preexisting external facts of the language.  The
> compiler of the dictionary does not create these facts.

No computer dictionary in the world is a perfect representation of the
"facts" that make up a language. There is no dictionary with 100%
accuracy. There is no way to take a dictionary and reverseengineer the
language from it.

Unmunge a hunspell dictionary, especially one with componds enabled,
and you will get gigabyte over gigabyte of "valid" (as by the rules of
the dictionary, not by the rules of the language)  words.
Claiming that a dictionary represents external *facts* of the language
just doesn't make any sense.

> He  merely
> encodes them.

No, this is not true. If it was encoding of the facts, you would
create a perfect dictionary. But what affix transformations are
created depends on the creator of the dictionary, the stems that are
included in the dictionary, what level of accuracy is targeted. The
existing affix rules affect other rules in a complex way. These are
not just "outside facts".

>  The particular dictionary might be copyrightable as a
> specific selection, coordination and arrangement of these facts, but
> fair use would allow me to extract the  same facts from the
> dictionaries, via reverse engineering, and make my own selection,
> coordination and arrangement of these same facts and distribute them
> as my own dictionary.

Of course you are free to create your own dictionary. But once gain
your conclusion is silly to the point where I cannot take you
seriously.

You logic really means that you cannot copyright any kind of software,
because you are still able to write your own copy that does the same
since the fundamental math that makes up software is the same, as
you're just rearranging some keywords, the fundamental facts of the
programming language around.

This is stupid.

While it is true that you can rewrite software to do the same, and no
copyright does hinder you from doing so (as Tor implied there might be
other means like patents or other stuff that have nothing to do with
copyright), you all put copyright statements in sourcecode.

>  In other words, you might be able to protect
> the compilation of facts, but you cannot protect the underlying facts,

Yes, with that I agree. (Everyone does I guess). Except the
"compilation of facts" part. It is not a compilation of facts. It is
"guesswork", closing the gaps to the actual facts.

You might be able to do a "just a compilation of facts" style
dictionary for an artificial language, but not for a language that
people are actually using in real life.

> or prevent people from copying your encoding of these facts and
> distributing a different arrangement of them.

Here I (and others) strongly disagree with you. Copying the encoding
of the facts and just altering them is no different from taking
sourcecode from any software, putting your nametag on it and shuffling
things around.

The important matter is *different arrangement* here. Once again: You
are free to (attempt to) create a dictionary for the same language by
yourself. Language itself of course is not protected. But you will not
end up with the same "encoding" (approximation) of the language since
it is not just a matter of collecting facts. It is a creative process.
And once again I challenge your knowledge about the dictionaries. I
just cannot explain otherwise how you can claim it is just a
collection of facts with no creative effort behind.

Once again: Following your path of thought, you could not put a
copyright on any software, as the rules of math are the same for
anyone, and you're just applying those rules to create the same
result. And this is nonsense.

Copyright is not to prevent others from creating stuff that does or
behaves the same.

Copyright does cover the actual way how it is done, applies to the
concrete solution to the given problem.

> This should not be hard to understand.  Free software advocates argue
> all the time that software cannot be patented because it is "just
> math".  Books have been written about it.  So why is it so hard to
> understand that linguistic facts cannot be copyrighted?

Because you don't understand that a dictionary doesn't represent
linguistic facts. As there is no such thing as linguistic fact.
If there was, you could create a perfect dictionary.
There is an approximation at best. A computer dictionary is not a list
of words, is not a list of hard facts.

> In any case, these are important concepts to understand.  If it is not
> clearer, after reading this response, then try doing a Google query on
> terms such as copyright, compilation and facts.

No, it is pointless to search those facts, when the basic assumption
that a dictionary is a mere compilation of facts is wrong already.

ciao
Christian

Mime
View raw message