commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benedikt Ritter <brit...@apache.org>
Subject Re: [TEXT] Review of the code base
Date Mon, 27 Apr 2015 06:10:34 GMT
Hello Phil,

2015-04-27 4:56 GMT+02:00 Phil Steitz <phil.steitz@gmail.com>:

> On 4/26/15 3:14 AM, Benedikt Ritter wrote:
> > 2015-04-26 11:52 GMT+02:00 Benedikt Ritter <britter@apache.org>:
> >
> >>
> >> 2015-04-19 17:29 GMT+02:00 Benedikt Ritter <britter@apache.org>:
> >>
> >>>
> >>> 2015-04-19 17:24 GMT+02:00 Benedikt Ritter <britter@apache.org>:
> >>>
> >>>> Hello Bruno,
> >>>>
> >>>> 2015-04-19 12:15 GMT+02:00 Bruno P. Kinoshita <
> >>>> brunodepaulak@yahoo.com.br>:
> >>>>
> >>>>> Thanks for the thorough code review Benedikt! Comments inline.
> >>>>>
> >>>>>> 1. IP clearance! There are several comments talking about code
> >>>>> adapted from
> >>>>>> PHP libraries. This needs to be listed in NOTICE.txt. For an
example
> >>>>> see
> >>>>>> the NOTICE.txt in [lang].
> >>>>> Filed https://issues.apache.org/jira/browse/SANDBOX-497
> >>>>> And first try to fix it:
> >>>>>
> >>>>>
> >>>>>
> https://git1-us-west.apache.org/repos/asf?p=commons-text.git;a=commitdiff;h=refs/heads/SANDBOX-497
> >>>>> Does that look correct? I also checked with the author before porting
> >>>>> it to the initial Java version, and told him about the [text]
> component -
> >>>>> https://github.com/jasonpriem/HumanNameParser.php/issues/11
> >>>>
> >>>> I've adjusted the changes a bit and merged it into master.
> >>>>
> >>>>
> >>>>>> 2. The names package needs work.
> >>>>> Completely agree. We can try to make it thread safe too. This will
> take
> >>>>> a little longer.
> >>>>> File new issue for tracking it
> >>>>> https://issues.apache.org/jira/browse/SANDBOX-498
> >>>>
> >>>> I've pushed a new branch containing a proposal how to fix this. Please
> >>>> review.
> >>>>
> >>>>
> >>>>>
> >>>>>> 3. There are a some useless JavaDocs, which can be dropped.
For
> >>>>> example:
> >>>>>> (...)> Better rename the variable to middleName and drop
the
> comment.
> >>>>> What about renaming the variables but leaving the comments?
> >>>>>
> >>>> For I've corrected this for the names package in the SANDBOX-498
> branch.
> >>>>
> >>>>
> >>>>>> 4. Several classes have no tests. The overall test coverage
looks
> >>>>> good, so
> >>>>>> this may not be a problem.
> >>>>> Before the initial release I can start a development cycle just
to
> add
> >>>>> new tests for parts that seem important - like code with somewhat
> high [+5]
> >>>>> cyclomatic complexity and uncovered branches. Just to make sure
that
> the
> >>>>> initial release will have an acceptable quality and less hidden
bugs
> :-)
> >>>>>
> >>>> Nice!
> >>>>
> >>>>
> >>>>>> 5. I don't think the shade configuration is correct. To me it
looks
> >>>>> like
> >>>>>> commons-text depends on [lang] and also shades some classes.
Do we
> >>>>> really
> >>>>>> need the explicit dependency? Furthermore, why do we need to
shade
> so
> >>>>> many
> >>>>>> classes? Why do we need anything beside StringUtils?
> >>>>> My mistake. I copied the [functor] configuration without much
> thinking.
> >>>>> IIRC, this dependency was added because of the names package. But
it
> >>>>> shouldn't be too complicate to replace it. It is mainly used for
> checking
> >>>>> empty/blank values.
> >>>>>
> >>>>> So let's drop the [lang] dependency and use something like `$VAR
!=
> >>>>> null && ! "".equals($VAR)` instead?
> >>>>>
> >>>> I think we also need StringUtils in the similarity package. We should
> >>>> handle this after we have merged the changes made to the names
> package.
> >>>>
> >>>>
> >>>>> Thanks again for reviewing the code!
> >>>>>
> >>>> No problem, thanks for pushing this forward.
> >>>>
> >>>> Here are some more things we need to take care of before 1.0:
> >>>> - JIRA project
> >>>>
> >>> Requested a JIRA project:
> >>>
> https://issues.apache.org/jira/servicedesk/customer/portal/1/INFRA-9477
> >>>
> >> Here is the JIRA project [1]. I'll find out, how we can move the Commons
> >> Text issues from the Sandbox project to the new project.
> >>
> > I've migrated all issues from the Sandbox project to the new Commons Text
> > project. I've also updated the project documentation.
>
> Did I miss the promotion VOTE?
>

No, but I think the component should be in a state to be released before
starting a promotion vote. We have examples of components which have been
promoted to early and I don't want this to happen with [text].

Benedikt


>
> Phil
> >
> >
> >> Benedikt
> >>
> >> [1] https://issues.apache.org/jira/browse/TEXT
> >>
> >>
> >>>
> >>>> - Rename similarity package to distance package? We have five Distance
> >>>> implementations but only one Similarity. Furthermore we have
> FuzzyScore. Is
> >>>> it a distance or a similarity? It should be renamed accordingly.
> >>>> - ConsineSimilariy uses Vectors modeled as Map<CharSequence, Integer>.
> >>>> Does it make sense to introduce a new class called Vector?
> >>>>
> >>>> Benedikt
> >>>>
> >>>>
> >>>>> All the best,Bruno
> >>>>>
> >>>>>
> >>>>>       From: Benedikt Ritter <britter@apache.org>
> >>>>>  To: Commons Developers List <dev@commons.apache.org>
> >>>>>  Sent: Sunday, April 19, 2015 9:01 PM
> >>>>>  Subject: [TEXT] Review of the code base
> >>>>>
> >>>>> Hi all,
> >>>>>
> >>>>> I've looked through the code base of [text] and I already did some
> clean
> >>>>> up. However there are a few things, that need more work IMHO:
> >>>>>
> >>>>> 1. IP clearance! There are several comments talking about code
> adapted
> >>>>> from
> >>>>> PHP libraries. This needs to be listed in NOTICE.txt. For an example
> see
> >>>>> the NOTICE.txt in [lang].
> >>>>>
> >>>>> 2. The names package needs work. Currently the HumanNameParser takes
> a
> >>>>> Name
> >>>>> object (which is just a wrapper around a String) as parameter. The
> parse
> >>>>> result is then saved in fields (e.g. firstName). This makes the
> parser
> >>>>> unusable after it has parsed a name. I think the API should be
> reworked
> >>>>> such as:
> >>>>> - The constructor of the parser takes configuration options which
> can be
> >>>>> reused for several names to parse
> >>>>> - the parse method takes a string as parameter, containing a name
> >>>>> - the parse method returns an immutable Name objects which has
> getters
> >>>>> for
> >>>>> firstName, lastName etc.
> >>>>>
> >>>>> 3. There are a some useless JavaDocs, which can be dropped. For
> example:
> >>>>> /**
> >>>>>  * Middle name.
> >>>>>  */
> >>>>> private String middle;
> >>>>>
> >>>>> Better rename the variable to middleName and drop the comment.
> >>>>>
> >>>>> 4. Several classes have no tests. The overall test coverage looks
> good,
> >>>>> so
> >>>>> this may not be a problem.
> >>>>>
> >>>>> 5. I don't think the shade configuration is correct. To me it looks
> like
> >>>>> commons-text depends on [lang] and also shades some classes. Do
we
> >>>>> really
> >>>>> need the explicit dependency? Furthermore, why do we need to shade
so
> >>>>> many
> >>>>> classes? Why do we need anything beside StringUtils?
> >>>>>
> >>>>> Nevertheless, this library looks very good. Kudos to Bruno for
> pushing
> >>>>> this
> >>>>> forward!
> >>>>>
> >>>>> Regards,
> >>>>> Benedikt
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> http://people.apache.org/~britter/
> >>>>> http://www.systemoutprintln.de/
> >>>>> http://twitter.com/BenediktRitter
> >>>>> http://github.com/britter
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> http://people.apache.org/~britter/
> >>>> http://www.systemoutprintln.de/
> >>>> http://twitter.com/BenediktRitter
> >>>> http://github.com/britter
> >>>>
> >>>
> >>>
> >>> --
> >>> http://people.apache.org/~britter/
> >>> http://www.systemoutprintln.de/
> >>> http://twitter.com/BenediktRitter
> >>> http://github.com/britter
> >>>
> >>
> >>
> >> --
> >> http://people.apache.org/~britter/
> >> http://www.systemoutprintln.de/
> >> http://twitter.com/BenediktRitter
> >> http://github.com/britter
> >>
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


-- 
http://people.apache.org/~britter/
http://www.systemoutprintln.de/
http://twitter.com/BenediktRitter
http://github.com/britter

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message