devicemap-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Werner Keil <werner.k...@gmail.com>
Subject Re: 2x Performance Increase in classify()
Date Wed, 10 Dec 2014 19:00:41 GMT
Just for Android take
https://svn.apache.org/repos/asf/devicemap/trunk/devicemap/java/simpleddr/src/main/java/org/apache/devicemap/simpleddr/builder/device/AndroidDeviceBuilder.java

I believe occasionally there are rudimentary regex patterns in the XML but
at least  for some of the more popular platforms these builders add power
the current "light" generic parser lacks.

Werner



On Wed, Dec 10, 2014 at 7:29 PM, Reza Naghibi <
reza.naghibi@yahoo.com.invalid> wrote:

> Can you show me these regex patterns? Are these patterns used for parsing
> or identification? Do they only exist in code or are they in the DDR?
>
>       From: Werner Keil <werner.keil@gmail.com>
>  To: dev@devicemap.apache.org; Reza Naghibi <reza.naghibi@yahoo.com>
>  Sent: Wednesday, December 10, 2014 1:23 PM
>  Subject: Re: 2x Performance Increase in classify()
>
> It does not parse the user agent it only uses more sophisticated (and see
> Android, etc. tailor made) regex patterns than the current large XML parser
> does;-)
>
>
>
>
>
>
> On Wed, Dec 10, 2014 at 7:15 PM, Reza Naghibi
> <reza.naghibi@yahoo.com.invalid> wrote:
>
> If you are saying that the OpenDDR client parses the user agent string,
> then that is something we need to avoid at all costs. I honestly was not
> aware that OpenDDR did parsing like that. Parsing the user agent has a
> whole lot of problems associated with it. The best approach, and the
> approach the current client uses, is to use pattern matching on device,
> browser, and OS signatures and use that to target specific devices,
> browsers, operating systems, and their versions.
>
>       From: Werner Keil <werner.keil@gmail.com>
>  To: dev@devicemap.apache.org; Reza Naghibi <reza.naghibi@yahoo.com>
>  Sent: Wednesday, December 10, 2014 12:41 PM
>  Subject: Re: 2x Performance Increase in classify()
>
> Well, it's not "legacy" it's simply the W3C compliant version, while the
> new one deviates from that.
>
> It won't recognize the OS neither on the Samsung Galaxy 10.1 N upgraded to
> Android 4.1 or 4.2 now, still says 4.0.4 (which is wrong but seems to
> differ from the XML file, so the classifier tries "something" but not
> exactly the right thing)
> nor Android 5 on the Nexus 7. There it bluntly returns what's in the XML
> data file, "4.1" instead of the correct 5 also matching the UA.
>
> As the W3C client isn't on the VM it is not so easy to test it against
> actual tablets, but providing an actual UA like those from these tablets by
> hand should work.
>
> For Nexus especially there seems to be a bug in the data files. Someone
> invented "genericGoogle" which is a lose end, neither the W3C client nor
> the new parser would find something as the parent doesn't seem to be in any
> of the files ;-O
>
>
>
>
> On Wed, Dec 10, 2014 at 6:23 PM, Reza Naghibi <
> reza.naghibi@yahoo.com.invalid> wrote:
>
> > >> currently provide better recognition of say an update to Android 4 or
> 5
> >
> > Hmm... can you explain this in more detail?
> >
> > From my work on the legacy client, it does not do anything more than
> > matching builder strings against user agents. The legacy client had a
> more
> > brute force algorithm which would have to pick a particular builder to
> use,
> > which was error prone. The new classifier client attempts to match all
> > builders at once and then chooses the highest ranking match, thus
> > increasing the accuracy. So I am not aware of any reason that one client
> > can recognize a pattern better than the other, especially if they are
> > working off the of the same data. Only the opposite is possible, missing
> a
> > pattern match.
> >
> >      From: Werner Keil <werner.keil@gmail.com>
> >  To: dev@devicemap.apache.org; Reza Naghibi <reza.naghibi@yahoo.com>
> >  Sent: Wednesday, December 10, 2014 12:13 PM
> >  Subject: Re: 2x Performance Increase in classify()
> >
> > Volkan/Reza,
> >
> > Let's keep in mind, the W3C DDR implementation has specialized
> recognition
> > classes like OrderedTokenDeviceBuilder or TwoStepDeviceBuilder and
> > subclasses that analyze the UserAgent more thoroughly, and currently
> > provide better recognition of say an update to Android 4 or 5.
> >
> > Werner
> >
> >
> >
> >
> > On Wed, Dec 10, 2014 at 5:43 PM, Reza Naghibi <
> > reza.naghibi@yahoo.com.invalid> wrote:
> >
> > > Volkan,
> > >
> > > Thanks for the performance patch. I reviewed it and it looks pretty
> good.
> > > Pre patch, we were running each ngram set thru some raw string
> processing
> > > normalizations. You patch does a good job moving that to the beginning
> > and
> > > optimizing the regex. Good job :)
> > >
> > > As for pattern matching, if you look at the normalization method, we
> only
> > > look at alpha-numerics. This was done for simplicity sake. The downside
> > > here is that we weaken any pattern which contains non alpha numerics.
> > There
> > > are several ways to address and fix this, but since DeviceMap has
> control
> > > over its own data, I prefer fixing the patterns and keeping the
> matching
> > > engine simple. The thing to remember is that our data came from OpenDDR
> > > which had a more complex classification algorithm and heuristics, so we
> > > kind of have a bit of legacy baggage to sort thru as this project
> > evolves.
> > >
> > > Regarding our next release, I already have the Java client 1.1.0 ready
> to
> > > go. I would like to get your patch in on the next release, 1.1.1.
> > >
> > > Reza
> > >
> > >
> > >      From: Volkan YAZICI <volkan.yazici@gmail.com>
> > >  To: "devicemap-dev@incubator.apache.org" <
> > > devicemap-dev@incubator.apache.org>
> > >  Sent: Wednesday, December 10, 2014 9:32 AM
> > >  Subject: 2x Performance Increase in classify()
> > >
> > > Good news everyone!
> > >
> > > Here is the patch that introduces JMH-based benchmarks for Java client:
> > > DMAP-106 <https://issues.apache.org/jira/browse/DMAP-106>
> > >
> > > And here is the patch that introduces >2x performance gain: DMAP-107
> > > <https://issues.apache.org/jira/browse/DMAP-107>
> > >
> > > *Sample output:*
> > >
> > > $ export userAgentFile=/path/to/user-agents.txt
> > > $ wc -l $userAgentFile
> > > 195325
> > > $ java \
> > >    -jar
> > >
> devicemap/java/classifier-benchmark/target/devicemap-client-benchmark.jar
> > > \
> > >    -jvmArgsAppend "-server -XX:+TieredCompilation -XX:+AggressiveOpts
> > > -Xms1024m -Xmx4096m -DuserAgentFile=$userAgentFile" \
> > >    -wi 5 -i 5 -bm avgt -tu ms -f 3 \
> > >    ".*DeviceMapClientBenchmark.*"
> > >
> > > # Using the most recent trunk.
> > > Result: 12079.408 ±(99.9%) 1240.628 ms/op [Average]
> > >  Statistics: (min, avg, max) = (11232.424, 12079.408, 16011.000),
> > > stdev = 1160.484
> > >  Confidence interval (99.9%): [10838.781, 13320.036]
> > >
> > > # Using the enhanced classify().
> > > Result: 5505.355 ±(99.9%) 441.748 ms/op [Average]
> > >  Statistics: (min, avg, max) = (5060.269, 5505.355, 6508.699), stdev =
> > > 413.211
> > >  Confidence interval (99.9%): [5063.607, 5947.103]
> > >
> > >
> > > Cheers!
> > >
> > >
> > >
> >
> >
> >
>
>
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message