devicemap-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Werner Keil <werner.k...@gmail.com>
Subject Re: Handling Bots and HTTP Clients
Date Tue, 09 Dec 2014 17:11:08 GMT
A new benchmark would of course be great.

As of now in the absence of other performance tests, I had to present the
figures from the W3C DDR implementation. Should there be others (I believe
Eberhard or other contributors once or twice mentioned blazing fast
performance, but so far there has been no sustainable benchmark for others
to execute and measure themselves;-) it would benefit not just events like
ApacheCon, etc.

Werner


On Tue, Dec 9, 2014 at 5:59 PM, Volkan YAZICI <volkan.yazici@gmail.com>
wrote:

> The model I proposed will not buy us a significant performance gain, which
> was also not my major motivation. (That being said, I also second the idea
> of implementing a benchmark.) Instead, I wanted to address the issue of
> separating the concerns of handling bots and regular devices.
>
> Maybe I better should rephrase my starting point: How can we add new bot
> and HTTP client footprints to the existing DDR?
>
> On Tue Dec 09 2014 at 2:31:24 PM Reza Naghibi
> <reza.naghibi@yahoo.com.invalid> wrote:
>
> > So let me explain some of the issues with this. Regardless, I would still
> > like you to benchmark said patch and share the results. This will help
> > drive the direction of future work on the clients.
> >
> > 1) Im almost certain isBot(ua) will perform worse than classify(ua),
> > defeating the whole purpose of short circuiting classify. How do you plan
> > on implementing isBot()? If that algorithm performs better than
> classify(),
> > we might as well use it to match the entire DDR. No?
> >
> > 2) Under no circumstances should we implement DDR logic in code. The code
> > should remain as a generic as possible. This means that its just a plain
> > old ngram matcher. This kind of logic belongs in the DDR definition.
> Right
> > now this allows for patterns and ranking. So maybe what you asking is
> that
> > high ranking patterns be checked for first in a very quick way? Well, why
> > are bots so high ranking? In normal traffic, bots make up a very small
> > percentage. So wouldnt it make sense to check for Samsung and Apple
> > products?
> >
> > Once again, if possible, please benchmark some before and afters so we
> can
> > get a better idea of what we are working with here. Eventhough im leaning
> > towards saying this is a bad idea, I think it is a good exercise.
> >
> >
> >       From: Volkan YAZICI <volkan.yazici@gmail.com>
> >  To: "devicemap-dev@incubator.apache.org" <devicemap-dev@incubator.
> > apache.org>
> >  Sent: Tuesday, December 9, 2014 7:34 AM
> >  Subject: Handling Bots and HTTP Clients
> >
> > Hello,
> >
> > In the context of discussion "how do we handle HTTP clients", I would
> like
> > to vote for treating them as bots. Further, I want to propose adding a
> thin
> > layer above DeviceMapClient.classify() to make a shortcut for handling of
> > the bots as follows.
> >
> > private final static Map<String, String> botAttributes =
> > Collections.singletonMap("is_bot", "true");
> >
> > public Map<String, String> classify(String userAgent) {
> >     if (isBot(userAgent)) return botAttributes;
> > }
> >
> > The motivation for this change is as follows:
> >
> >   - Almost all of the attributes are making no sense for a bot and we are
> >   losing time to match it against the whole DDR.
> >   - Bot database will be able to evolve independently.
> >   - We can come up with a single compiled j.u.regex.Pattern to check
> bots.
> >   (I am pretty sure Reza knows a lot better performing approaches, but
> > maybe
> >   for a future release.)
> >
> > If the development team is ok with that, I want to implement this
> feature.
> >
> > Best.
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message