commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benedikt Ritter <brit...@apache.org>
Subject Re: [TEXT] Review of the code base
Date Thu, 30 Apr 2015 06:59:26 GMT
2015-04-27 8:10 GMT+02:00 Benedikt Ritter <britter@apache.org>:

> Hello Phil,
>
> 2015-04-27 4:56 GMT+02:00 Phil Steitz <phil.steitz@gmail.com>:
>
>> On 4/26/15 3:14 AM, Benedikt Ritter wrote:
>> > 2015-04-26 11:52 GMT+02:00 Benedikt Ritter <britter@apache.org>:
>> >
>> >>
>> >> 2015-04-19 17:29 GMT+02:00 Benedikt Ritter <britter@apache.org>:
>> >>
>> >>>
>> >>> 2015-04-19 17:24 GMT+02:00 Benedikt Ritter <britter@apache.org>:
>> >>>
>> >>>> Hello Bruno,
>> >>>>
>> >>>> 2015-04-19 12:15 GMT+02:00 Bruno P. Kinoshita <
>> >>>> brunodepaulak@yahoo.com.br>:
>> >>>>
>> >>>>> Thanks for the thorough code review Benedikt! Comments inline.
>> >>>>>
>> >>>>>> 1. IP clearance! There are several comments talking about
code
>> >>>>> adapted from
>> >>>>>> PHP libraries. This needs to be listed in NOTICE.txt. For
an
>> example
>> >>>>> see
>> >>>>>> the NOTICE.txt in [lang].
>> >>>>> Filed https://issues.apache.org/jira/browse/SANDBOX-497
>> >>>>> And first try to fix it:
>> >>>>>
>> >>>>>
>> >>>>>
>> https://git1-us-west.apache.org/repos/asf?p=commons-text.git;a=commitdiff;h=refs/heads/SANDBOX-497
>> >>>>> Does that look correct? I also checked with the author before
>> porting
>> >>>>> it to the initial Java version, and told him about the [text]
>> component -
>> >>>>> https://github.com/jasonpriem/HumanNameParser.php/issues/11
>> >>>>
>> >>>> I've adjusted the changes a bit and merged it into master.
>> >>>>
>> >>>>
>> >>>>>> 2. The names package needs work.
>> >>>>> Completely agree. We can try to make it thread safe too. This
will
>> take
>> >>>>> a little longer.
>> >>>>> File new issue for tracking it
>> >>>>> https://issues.apache.org/jira/browse/SANDBOX-498
>> >>>>
>> >>>> I've pushed a new branch containing a proposal how to fix this.
>> Please
>> >>>> review.
>> >>>>
>> >>>>
>> >>>>>
>> >>>>>> 3. There are a some useless JavaDocs, which can be dropped.
For
>> >>>>> example:
>> >>>>>> (...)> Better rename the variable to middleName and drop
the
>> comment.
>> >>>>> What about renaming the variables but leaving the comments?
>> >>>>>
>> >>>> For I've corrected this for the names package in the SANDBOX-498
>> branch.
>> >>>>
>> >>>>
>> >>>>>> 4. Several classes have no tests. The overall test coverage
looks
>> >>>>> good, so
>> >>>>>> this may not be a problem.
>> >>>>> Before the initial release I can start a development cycle just
to
>> add
>> >>>>> new tests for parts that seem important - like code with somewhat
>> high [+5]
>> >>>>> cyclomatic complexity and uncovered branches. Just to make sure
>> that the
>> >>>>> initial release will have an acceptable quality and less hidden
>> bugs :-)
>> >>>>>
>> >>>> Nice!
>> >>>>
>> >>>>
>> >>>>>> 5. I don't think the shade configuration is correct. To
me it looks
>> >>>>> like
>> >>>>>> commons-text depends on [lang] and also shades some classes.
Do we
>> >>>>> really
>> >>>>>> need the explicit dependency? Furthermore, why do we need
to shade
>> so
>> >>>>> many
>> >>>>>> classes? Why do we need anything beside StringUtils?
>> >>>>> My mistake. I copied the [functor] configuration without much
>> thinking.
>> >>>>> IIRC, this dependency was added because of the names package.
But it
>> >>>>> shouldn't be too complicate to replace it. It is mainly used
for
>> checking
>> >>>>> empty/blank values.
>> >>>>>
>> >>>>> So let's drop the [lang] dependency and use something like `$VAR
!=
>> >>>>> null && ! "".equals($VAR)` instead?
>> >>>>>
>> >>>> I think we also need StringUtils in the similarity package. We should
>> >>>> handle this after we have merged the changes made to the names
>> package.
>> >>>>
>> >>>>
>> >>>>> Thanks again for reviewing the code!
>> >>>>>
>> >>>> No problem, thanks for pushing this forward.
>> >>>>
>> >>>> Here are some more things we need to take care of before 1.0:
>> >>>> - JIRA project
>> >>>>
>> >>> Requested a JIRA project:
>> >>>
>> https://issues.apache.org/jira/servicedesk/customer/portal/1/INFRA-9477
>> >>>
>> >> Here is the JIRA project [1]. I'll find out, how we can move the
>> Commons
>> >> Text issues from the Sandbox project to the new project.
>> >>
>> > I've migrated all issues from the Sandbox project to the new Commons
>> Text
>> > project. I've also updated the project documentation.
>>
>> Did I miss the promotion VOTE?
>>
>
> No, but I think the component should be in a state to be released before
> starting a promotion vote. We have examples of components which have been
> promoted to early and I don't want this to happen with [text].
>

I hope we haven't stepped on anybody's toes or raised the impression we're
doing stuff without properly informing the dev ML about our plans. I will
write an informational e-mail together illustrating what we have
implemented, what our plans are and what the road map for promotion is.

Thank you!
Benedikt


>
> Benedikt
>
>
>>
>> Phil
>> >
>> >
>> >> Benedikt
>> >>
>> >> [1] https://issues.apache.org/jira/browse/TEXT
>> >>
>> >>
>> >>>
>> >>>> - Rename similarity package to distance package? We have five
>> Distance
>> >>>> implementations but only one Similarity. Furthermore we have
>> FuzzyScore. Is
>> >>>> it a distance or a similarity? It should be renamed accordingly.
>> >>>> - ConsineSimilariy uses Vectors modeled as Map<CharSequence,
>> Integer>.
>> >>>> Does it make sense to introduce a new class called Vector?
>> >>>>
>> >>>> Benedikt
>> >>>>
>> >>>>
>> >>>>> All the best,Bruno
>> >>>>>
>> >>>>>
>> >>>>>       From: Benedikt Ritter <britter@apache.org>
>> >>>>>  To: Commons Developers List <dev@commons.apache.org>
>> >>>>>  Sent: Sunday, April 19, 2015 9:01 PM
>> >>>>>  Subject: [TEXT] Review of the code base
>> >>>>>
>> >>>>> Hi all,
>> >>>>>
>> >>>>> I've looked through the code base of [text] and I already did
some
>> clean
>> >>>>> up. However there are a few things, that need more work IMHO:
>> >>>>>
>> >>>>> 1. IP clearance! There are several comments talking about code
>> adapted
>> >>>>> from
>> >>>>> PHP libraries. This needs to be listed in NOTICE.txt. For an
>> example see
>> >>>>> the NOTICE.txt in [lang].
>> >>>>>
>> >>>>> 2. The names package needs work. Currently the HumanNameParser
>> takes a
>> >>>>> Name
>> >>>>> object (which is just a wrapper around a String) as parameter.
The
>> parse
>> >>>>> result is then saved in fields (e.g. firstName). This makes
the
>> parser
>> >>>>> unusable after it has parsed a name. I think the API should
be
>> reworked
>> >>>>> such as:
>> >>>>> - The constructor of the parser takes configuration options
which
>> can be
>> >>>>> reused for several names to parse
>> >>>>> - the parse method takes a string as parameter, containing a
name
>> >>>>> - the parse method returns an immutable Name objects which has
>> getters
>> >>>>> for
>> >>>>> firstName, lastName etc.
>> >>>>>
>> >>>>> 3. There are a some useless JavaDocs, which can be dropped.
For
>> example:
>> >>>>> /**
>> >>>>>  * Middle name.
>> >>>>>  */
>> >>>>> private String middle;
>> >>>>>
>> >>>>> Better rename the variable to middleName and drop the comment.
>> >>>>>
>> >>>>> 4. Several classes have no tests. The overall test coverage
looks
>> good,
>> >>>>> so
>> >>>>> this may not be a problem.
>> >>>>>
>> >>>>> 5. I don't think the shade configuration is correct. To me it
looks
>> like
>> >>>>> commons-text depends on [lang] and also shades some classes.
Do we
>> >>>>> really
>> >>>>> need the explicit dependency? Furthermore, why do we need to
shade
>> so
>> >>>>> many
>> >>>>> classes? Why do we need anything beside StringUtils?
>> >>>>>
>> >>>>> Nevertheless, this library looks very good. Kudos to Bruno for
>> pushing
>> >>>>> this
>> >>>>> forward!
>> >>>>>
>> >>>>> Regards,
>> >>>>> Benedikt
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> http://people.apache.org/~britter/
>> >>>>> http://www.systemoutprintln.de/
>> >>>>> http://twitter.com/BenediktRitter
>> >>>>> http://github.com/britter
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> http://people.apache.org/~britter/
>> >>>> http://www.systemoutprintln.de/
>> >>>> http://twitter.com/BenediktRitter
>> >>>> http://github.com/britter
>> >>>>
>> >>>
>> >>>
>> >>> --
>> >>> http://people.apache.org/~britter/
>> >>> http://www.systemoutprintln.de/
>> >>> http://twitter.com/BenediktRitter
>> >>> http://github.com/britter
>> >>>
>> >>
>> >>
>> >> --
>> >> http://people.apache.org/~britter/
>> >> http://www.systemoutprintln.de/
>> >> http://twitter.com/BenediktRitter
>> >> http://github.com/britter
>> >>
>> >
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>
>
> --
> http://people.apache.org/~britter/
> http://www.systemoutprintln.de/
> http://twitter.com/BenediktRitter
> http://github.com/britter
>



-- 
http://people.apache.org/~britter/
http://www.systemoutprintln.de/
http://twitter.com/BenediktRitter
http://github.com/britter

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message