devicemap-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Reza Naghibi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DMAP-107) Performance optimizations for DeviceMapClient.classify()
Date Mon, 13 Jul 2015 16:10:05 GMT

    [ https://issues.apache.org/jira/browse/DMAP-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14624864#comment-14624864
] 

Reza Naghibi commented on DMAP-107:
-----------------------------------

Regarding removing the 2 test strings, looks like "HTC One X" and "HTC One X+" match the same
since 1.0 normalizes out the regex. Otherwise, im guessing the + is an error because its likely
meant to be a literal \+ and not a regex. Since the patterns are identical, the device choosen
is the first one reached during iteration. The patch changes this because the data structure
changes from a HashMap to a HashSet, so iteration order is different.

I could add logic to the ranking function to fix this, but at this point there is no use.
Getting matching to properly work on the ODDR data will never be perfect because the data
has many errors like the one above. So as always, the solution here is the fix the bad pattern.

Also, some problems with your split() method. It shouldn't be static and you can remove the
reference to Apache commons by using String.isEmpty(). Not sure we need the null check, but
null is allowed in normalize(), so its best to err on the side of safety. Below is the corrected
version:

{code:java}
private List<String> split(String text) {
        List<String> nonemptyParts = new ArrayList<String>();

        String[] parts = TEXT_SPLIT_PATTERN.split(text);

        for (String part : parts) {
            String normalizedPart = Pattern.normalize(part);

            if (normalizedPart != null && !normalizedPart.isEmpty()) {
                nonemptyParts.add(normalizedPart);
            }
        }
        
        return nonemptyParts;
    }
{code}

Also, the style of the 1.0 Java client is to be explicit with imports and not use the wildcard.
Just a small style nitpick. So if you can correct the above split() function (and fix the
imports), your patch should be good to go with the HTC One X tests removed.

> Performance optimizations for DeviceMapClient.classify()
> --------------------------------------------------------
>
>                 Key: DMAP-107
>                 URL: https://issues.apache.org/jira/browse/DMAP-107
>             Project: DeviceMap
>          Issue Type: Improvement
>          Components: Java Client
>    Affects Versions: 1.1.0 Java
>            Reporter: Volkan Yazıcı
>             Fix For: 1.1.1 Java
>
>         Attachments: classify.diff
>
>
> This patch removes redundant {{DeviceType}} checks and User-Agent string normalization
calls. Performance gain is more than 2x. Check out the devicemap-client-benchmark (introduced
in issue DMAP-106) output:
> {code}
> $ export userAgentFile=/path/to/user-agents.txt
> $ wc -l $userAgentFile
> 195325
> $ java \
>     -jar devicemap/java/classifier-benchmark/target/devicemap-client-benchmark.jar \
>     -jvmArgsAppend "-server -XX:+TieredCompilation -XX:+AggressiveOpts -Xms1024m -Xmx4096m
-DuserAgentFile=$userAgentFile" \
>     -wi 5 -i 5 -bm avgt -tu ms -f 3 \
>     ".*DeviceMapClientBenchmark.*"
> # Using the most recent trunk.
> Result: 12079.408 ±(99.9%) 1240.628 ms/op [Average]
>   Statistics: (min, avg, max) = (11232.424, 12079.408, 16011.000), stdev = 1160.484
>   Confidence interval (99.9%): [10838.781, 13320.036]
> # Using the enhanced classify().
> Result: 5505.355 ±(99.9%) 441.748 ms/op [Average]
>   Statistics: (min, avg, max) = (5060.269, 5505.355, 6508.699), stdev = 413.211
>   Confidence interval (99.9%): [5063.607, 5947.103]
> {code}
> JVM version: 1.8.0_25
> OS: OS X 10.9.5
> Processor: 2.6 GHz Intel Core i5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message