devicemap-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Werner Keil <werner.k...@gmail.com>
Subject Re: 2x Performance Increase in classify()
Date Wed, 10 Dec 2014 17:41:24 GMT
Well, it's not "legacy" it's simply the W3C compliant version, while the
new one deviates from that.

It won't recognize the OS neither on the Samsung Galaxy 10.1 N upgraded to
Android 4.1 or 4.2 now, still says 4.0.4 (which is wrong but seems to
differ from the XML file, so the classifier tries "something" but not
exactly the right thing)
nor Android 5 on the Nexus 7. There it bluntly returns what's in the XML
data file, "4.1" instead of the correct 5 also matching the UA.

As the W3C client isn't on the VM it is not so easy to test it against
actual tablets, but providing an actual UA like those from these tablets by
hand should work.

For Nexus especially there seems to be a bug in the data files. Someone
invented "genericGoogle" which is a lose end, neither the W3C client nor
the new parser would find something as the parent doesn't seem to be in any
of the files ;-O


On Wed, Dec 10, 2014 at 6:23 PM, Reza Naghibi <
reza.naghibi@yahoo.com.invalid> wrote:

> >> currently provide better recognition of say an update to Android 4 or 5
>
> Hmm... can you explain this in more detail?
>
> From my work on the legacy client, it does not do anything more than
> matching builder strings against user agents. The legacy client had a more
> brute force algorithm which would have to pick a particular builder to use,
> which was error prone. The new classifier client attempts to match all
> builders at once and then chooses the highest ranking match, thus
> increasing the accuracy. So I am not aware of any reason that one client
> can recognize a pattern better than the other, especially if they are
> working off the of the same data. Only the opposite is possible, missing a
> pattern match.
>
>       From: Werner Keil <werner.keil@gmail.com>
>  To: dev@devicemap.apache.org; Reza Naghibi <reza.naghibi@yahoo.com>
>  Sent: Wednesday, December 10, 2014 12:13 PM
>  Subject: Re: 2x Performance Increase in classify()
>
> Volkan/Reza,
>
> Let's keep in mind, the W3C DDR implementation has specialized recognition
> classes like OrderedTokenDeviceBuilder or TwoStepDeviceBuilder and
> subclasses that analyze the UserAgent more thoroughly, and currently
> provide better recognition of say an update to Android 4 or 5.
>
> Werner
>
>
>
>
> On Wed, Dec 10, 2014 at 5:43 PM, Reza Naghibi <
> reza.naghibi@yahoo.com.invalid> wrote:
>
> > Volkan,
> >
> > Thanks for the performance patch. I reviewed it and it looks pretty good.
> > Pre patch, we were running each ngram set thru some raw string processing
> > normalizations. You patch does a good job moving that to the beginning
> and
> > optimizing the regex. Good job :)
> >
> > As for pattern matching, if you look at the normalization method, we only
> > look at alpha-numerics. This was done for simplicity sake. The downside
> > here is that we weaken any pattern which contains non alpha numerics.
> There
> > are several ways to address and fix this, but since DeviceMap has control
> > over its own data, I prefer fixing the patterns and keeping the matching
> > engine simple. The thing to remember is that our data came from OpenDDR
> > which had a more complex classification algorithm and heuristics, so we
> > kind of have a bit of legacy baggage to sort thru as this project
> evolves.
> >
> > Regarding our next release, I already have the Java client 1.1.0 ready to
> > go. I would like to get your patch in on the next release, 1.1.1.
> >
> > Reza
> >
> >
> >      From: Volkan YAZICI <volkan.yazici@gmail.com>
> >  To: "devicemap-dev@incubator.apache.org" <
> > devicemap-dev@incubator.apache.org>
> >  Sent: Wednesday, December 10, 2014 9:32 AM
> >  Subject: 2x Performance Increase in classify()
> >
> > Good news everyone!
> >
> > Here is the patch that introduces JMH-based benchmarks for Java client:
> > DMAP-106 <https://issues.apache.org/jira/browse/DMAP-106>
> >
> > And here is the patch that introduces >2x performance gain: DMAP-107
> > <https://issues.apache.org/jira/browse/DMAP-107>
> >
> > *Sample output:*
> >
> > $ export userAgentFile=/path/to/user-agents.txt
> > $ wc -l $userAgentFile
> > 195325
> > $ java \
> >    -jar
> > devicemap/java/classifier-benchmark/target/devicemap-client-benchmark.jar
> > \
> >    -jvmArgsAppend "-server -XX:+TieredCompilation -XX:+AggressiveOpts
> > -Xms1024m -Xmx4096m -DuserAgentFile=$userAgentFile" \
> >    -wi 5 -i 5 -bm avgt -tu ms -f 3 \
> >    ".*DeviceMapClientBenchmark.*"
> >
> > # Using the most recent trunk.
> > Result: 12079.408 ±(99.9%) 1240.628 ms/op [Average]
> >  Statistics: (min, avg, max) = (11232.424, 12079.408, 16011.000),
> > stdev = 1160.484
> >  Confidence interval (99.9%): [10838.781, 13320.036]
> >
> > # Using the enhanced classify().
> > Result: 5505.355 ±(99.9%) 441.748 ms/op [Average]
> >  Statistics: (min, avg, max) = (5060.269, 5505.355, 6508.699), stdev =
> > 413.211
> >  Confidence interval (99.9%): [5063.607, 5947.103]
> >
> >
> > Cheers!
> >
> >
> >
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message