commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benedikt Ritter <brit...@apache.org>
Subject Re: [TEXT] Review of the code base
Date Sun, 19 Apr 2015 15:24:53 GMT
Hello Bruno,

2015-04-19 12:15 GMT+02:00 Bruno P. Kinoshita <brunodepaulak@yahoo.com.br>:

> Thanks for the thorough code review Benedikt! Comments inline.
>
> > 1. IP clearance! There are several comments talking about code adapted
> from
> >PHP libraries. This needs to be listed in NOTICE.txt. For an example see
> >the NOTICE.txt in [lang].
> Filed https://issues.apache.org/jira/browse/SANDBOX-497
> And first try to fix it:
>
>
> https://git1-us-west.apache.org/repos/asf?p=commons-text.git;a=commitdiff;h=refs/heads/SANDBOX-497
> Does that look correct? I also checked with the author before porting it
> to the initial Java version, and told him about the [text] component -
> https://github.com/jasonpriem/HumanNameParser.php/issues/11


I've adjusted the changes a bit and merged it into master.


>
> > 2. The names package needs work.
> Completely agree. We can try to make it thread safe too. This will take a
> little longer.
> File new issue for tracking it
> https://issues.apache.org/jira/browse/SANDBOX-498


I've pushed a new branch containing a proposal how to fix this. Please
review.


>
>
> > 3. There are a some useless JavaDocs, which can be dropped. For example:
> > (...)> Better rename the variable to middleName and drop the comment.
> What about renaming the variables but leaving the comments?
>

For I've corrected this for the names package in the SANDBOX-498 branch.


>
> > 4. Several classes have no tests. The overall test coverage looks good,
> so
> >this may not be a problem.
> Before the initial release I can start a development cycle just to add new
> tests for parts that seem important - like code with somewhat high [+5]
> cyclomatic complexity and uncovered branches. Just to make sure that the
> initial release will have an acceptable quality and less hidden bugs :-)
>

Nice!


> > 5. I don't think the shade configuration is correct. To me it looks like
> >commons-text depends on [lang] and also shades some classes. Do we really
> >need the explicit dependency? Furthermore, why do we need to shade so many
> >classes? Why do we need anything beside StringUtils?
> My mistake. I copied the [functor] configuration without much thinking.
> IIRC, this dependency was added because of the names package. But it
> shouldn't be too complicate to replace it. It is mainly used for checking
> empty/blank values.
>
> So let's drop the [lang] dependency and use something like `$VAR != null
> && ! "".equals($VAR)` instead?
>

I think we also need StringUtils in the similarity package. We should
handle this after we have merged the changes made to the names package.


> Thanks again for reviewing the code!
>

No problem, thanks for pushing this forward.

Here are some more things we need to take care of before 1.0:
- JIRA project
- Rename similarity package to distance package? We have five Distance
implementations but only one Similarity. Furthermore we have FuzzyScore. Is
it a distance or a similarity? It should be renamed accordingly.
- ConsineSimilariy uses Vectors modeled as Map<CharSequence, Integer>. Does
it make sense to introduce a new class called Vector?

Benedikt


>
> All the best,Bruno
>
>
>       From: Benedikt Ritter <britter@apache.org>
>  To: Commons Developers List <dev@commons.apache.org>
>  Sent: Sunday, April 19, 2015 9:01 PM
>  Subject: [TEXT] Review of the code base
>
> Hi all,
>
> I've looked through the code base of [text] and I already did some clean
> up. However there are a few things, that need more work IMHO:
>
> 1. IP clearance! There are several comments talking about code adapted from
> PHP libraries. This needs to be listed in NOTICE.txt. For an example see
> the NOTICE.txt in [lang].
>
> 2. The names package needs work. Currently the HumanNameParser takes a Name
> object (which is just a wrapper around a String) as parameter. The parse
> result is then saved in fields (e.g. firstName). This makes the parser
> unusable after it has parsed a name. I think the API should be reworked
> such as:
> - The constructor of the parser takes configuration options which can be
> reused for several names to parse
> - the parse method takes a string as parameter, containing a name
> - the parse method returns an immutable Name objects which has getters for
> firstName, lastName etc.
>
> 3. There are a some useless JavaDocs, which can be dropped. For example:
> /**
>  * Middle name.
>  */
> private String middle;
>
> Better rename the variable to middleName and drop the comment.
>
> 4. Several classes have no tests. The overall test coverage looks good, so
> this may not be a problem.
>
> 5. I don't think the shade configuration is correct. To me it looks like
> commons-text depends on [lang] and also shades some classes. Do we really
> need the explicit dependency? Furthermore, why do we need to shade so many
> classes? Why do we need anything beside StringUtils?
>
> Nevertheless, this library looks very good. Kudos to Bruno for pushing this
> forward!
>
> Regards,
> Benedikt
>
>
>
>
>
>
> --
> http://people.apache.org/~britter/
> http://www.systemoutprintln.de/
> http://twitter.com/BenediktRitter
> http://github.com/britter
>
>
>
>
>



-- 
http://people.apache.org/~britter/
http://www.systemoutprintln.de/
http://twitter.com/BenediktRitter
http://github.com/britter

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message