Mailing-List: contact harmony-dev-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: harmony-dev@incubator.apache.org
Received-SPF: pass (herse.apache.org: domain of andrey.yakushev@gmail.com
 designates 64.233.182.189 as permitted sender)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
        s=beta; d=gmail.com;
        h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
        b=gusj9hAWuAgEH9a9+2w8bb2kkH7ASZhfGKD6WDA/xVmejHKiagkVhuHgqoGwrvZN1CSiyBCWGWTtUGNub+8jSY2C0Tc5zB+9YLTVXB5VBgc65iu+BUXu6YtpIuCq8JEZMIiVRPTzCT5MwhZkthnXLkZvkn/NMbDYiRuFN0lzcok=
Message-ID: <3c7e1c080611160845t53955990y535e76334e564564@mail.gmail.com>
Date: Thu, 16 Nov 2006 19:45:38 +0300
From: "Andrey Yakushev" <andrey.yakushev@gmail.com>
To: harmony-dev@incubator.apache.org
Subject: Re: [doc] What should be improved in DRLVM Doxygen documentation?
In-Reply-To: <uvelr64xz.fsf@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <8E389A5F2FEABA4CB1DEC35A25CB39CE695B49@mssmsx411>
	 <uvelr64xz.fsf@gmail.com>

Alexei's metric is interesting, but sometimes shows strange results
for pretty good docs, like for quite commented files:

0 verifier_8h.html
0 structVerifier_1_1vf__Graph.html

Seems like this metric likes big narrative text.


I also agree that comments quality could be estimated through Doxygen
warnings. I attempted to use such a metric for DRLVM.

I used generated DoxygenDrlvmLog.txt file and parser string:

---8<------------------------------------------------------
cat DoxygenDrlvmLog.txt | grep Warning | awk -F ":" '{print $1}' >
~/tmp/r ; for f in `cat ~/tmp/r | sort | uniq` ; do ( echo `cat
~/tmp/r | grep $f | wc -l` " " $f ) ; done | sort -n -r
---8<------------------------------------------------------

The result is placed at
http://wiki.apache.org/harmony/DRLVM_Documentation_Quality_Doxygen_Warning_Rating

If it suits our needs we can think about regular testing.

Thanks,
Andrey


On 07 Nov 2006 14:23:20 +0600, Egor Pasko <egor.pasko@gmail.com> wrote:
> On the 0x216 day of Apache Harmony Alexei A. Fedotov wrote:
> > Nadya wrote,
> > > we could check for required Doxygen tags in certain elements.
> >
> > I wasn't asked, but cannot resist, sorry. You may achieve this right now
> > without additional coding. Doxygen warns about many problems you
> > describe, when you have the following option set.
> >
> > # If WARN_IF_UNDOCUMENTED is set to YES, then doxygen will generate
> > warnings
> > # for undocumented members. If EXTRACT_ALL is set to YES then this flag
> > will
> > # automatically be disabled.
> > WARN_IF_UNDOCUMENTED   = YES
> >
> > The resulting log consists of warning messages about different problems.
> > My DoxygenDrlvmLog.txt, for example, contains the following one:
> >
> > drlvm/trunk/vm/MMTk/ext/vm/HarmonyDRLVM/org/apache/HarmonyDRLVM/mm/mmtk/
> > Scanning.java:47: Warning: The following parameters of
> > org::apache::HarmonyDRLVM::mm::mmtk::Scanning::precopyChildren(TraceLoca
> > l trace, ObjectReference object) are not documented:
> >   parameter trace
>
> does it make sense to convert warnings to quality metrics and put on
> harmonytest.org (or even Wiki) regularly? It should encourage people
> (like me) to document sources better. Or it is too much effort?
>
> > With best regards,
> > Alexei Fedotov,
> > Intel Java & XML Engineering
> >
> > >-----Original Message-----
> > >From: Morozova, Nadezhda [mailto:nadezhda.morozova@intel.com]
> > >Sent: Friday, November 03, 2006 6:26 PM
> > >To: harmony-dev@incubator.apache.org
> > >Subject: RE: Re: [doc] What should be improved in DRLVM Doxygen
> > >documentation?
> > >
> > >Egor,
> > >I agree with you on the idea of simplicity: documented vs.
> > >non-documented.
> > >An additional point: do you think we need/want to evaluate quality of
> > >comments? we could check for required Doxygen tags in certain elements.
> > >For example, a function is almost certain to include @param and
> > @return.
> > >Surely, this is heuristics and does not solve all our problems. But the
> > >Doxygen quality check sometimes shows that the file does have comments,
> > >but they are not processed properly by Doxygen - which results in a low
> > >rating for an html file. Maybe this is a crazy idea - I'd be glad to
> > >know your opinion.
> > >
> > >Thank you,
> > >Nadya Morozova
> > >
> > >
> > >-----Original Message-----
> > >From: news [mailto:news@sea.gmane.org] On Behalf Of Egor Pasko
> > >Sent: Friday, November 03, 2006 12:18 PM
> > >To: harmony-dev@incubator.apache.org
> > >Subject: Re: [doc] What should be improved in DRLVM Doxygen
> > >documentation?
> > >
> > >On the 0x216 day of Apache Harmony Alexei Fedotov wrote:
> > >> Egor,
> > >>
> > >> Thank you for your interest.
> > >
> > >We definitely need to improve our documentation. Necessity is not a
> > >real interest :)
> > >
> > >> Here is an algorithm:
> > >>
> > >> 1. Create a list of words from HTML files.
> > >> 2. Merge a dictionary of all words used in documentation.
> > >> 3. Remove a half of the most frequently used words from the
> > dictionary
> > >> - I believe they do not add much sense.
> > >> 4. Remove misspelled words (including identifiers) from the
> > >dictionary.
> > >> 5. Give a page +1 for each rare, correctly spelled word according to
> > >> the dictionary.
> > >> 6. Divide to the total number of words on the page.
> > >
> > >hm, strange heuristic. More unique correctly spelled words is
> > >beneficial. It does not give a clue on the overall quality of
> > >documentation, which is rather confusing..
> > >
> > >I thought of something more natural. Number of documented items
> > >vs. number of non-documented. Plus a penalty to the relative number of
> > >misspelled words.
> > >
> > >> I've collected nice RFEs from your letter. Most of them make me think
> > >> and I like them.
> > >> a. Update an ASF block comment
> > >> b. Improve readability. Some things are really easy - like removing
> > >> awk and rewriting most things in perl. Others are a bit more complex
> > -
> > >> I targeted script performance when created auto-generated perl
> > script.
> > >> Also, initial algorithm was a bit more complex - different words had
> > a
> > >> different cost based on their popularity.
> > >> c. Use junit test output format to integrate with
> > >> http://harmonytest.org. I believe I need a feature request for that
> > >> site as well - we need some way to import performance-like rankings
> > to
> > >> the site.
> > >
> > >Yes, I thought of the RFE to harmonytest. At least, put the doc items
> > >on a separate page from the build items.
> > >
> > >> d. I will think of parsing sources. But I don't think we need to
> > >> maintain both scripts. The generic rule is simple - improve your .h
> > >> and .java files - .cpp files don't count. I suggest better to link
> > >> .html files to contributors.
> > >
> > >can you calculate a list of relevant filenames from a doc page? give
> > >filename +1 for each documented item, give a -1 for each undocumented,
> > >divide on the number of items. Is it easy to implement?  Maybe doxygen
> > >has some features to assist this?
> > >
> > >> Thank you for ideas. I will certainly update the script. I just want
> > >> to wait a bit - many scripts die just because people are not
> > >> interested to run them a second time. Also, if anyone suggest any
> > >> changes in algorithm or any other RFEs, I want to implement them all
> > >> at once.
> > >>
> > >> Nadya, could you please point us a good documentation file so we can
> > >> use it as a pattern?
> > >
> > >--
> > >Egor Pasko
> >
>
> --
> Egor Pasko
>
>