devicemap-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reza Naghibi <reza.nagh...@yahoo.com.INVALID>
Subject Re: Handling Bots and HTTP Clients
Date Tue, 09 Dec 2014 13:27:41 GMT
So let me explain some of the issues with this. Regardless, I would still like you to benchmark
said patch and share the results. This will help drive the direction of future work on the
clients.

1) Im almost certain isBot(ua) will perform worse than classify(ua), defeating the whole purpose
of short circuiting classify. How do you plan on implementing isBot()? If that algorithm performs
better than classify(), we might as well use it to match the entire DDR. No?

2) Under no circumstances should we implement DDR logic in code. The code should remain as
a generic as possible. This means that its just a plain old ngram matcher. This kind of logic
belongs in the DDR definition. Right now this allows for patterns and ranking. So maybe what
you asking is that high ranking patterns be checked for first in a very quick way? Well, why
are bots so high ranking? In normal traffic, bots make up a very small percentage. So wouldnt
it make sense to check for Samsung and Apple products?

Once again, if possible, please benchmark some before and afters so we can get a better idea
of what we are working with here. Eventhough im leaning towards saying this is a bad idea,
I think it is a good exercise.


      From: Volkan YAZICI <volkan.yazici@gmail.com>
 To: "devicemap-dev@incubator.apache.org" <devicemap-dev@incubator.apache.org> 
 Sent: Tuesday, December 9, 2014 7:34 AM
 Subject: Handling Bots and HTTP Clients
   
Hello,

In the context of discussion "how do we handle HTTP clients", I would like
to vote for treating them as bots. Further, I want to propose adding a thin
layer above DeviceMapClient.classify() to make a shortcut for handling of
the bots as follows.

private final static Map<String, String> botAttributes =
Collections.singletonMap("is_bot", "true");

public Map<String, String> classify(String userAgent) {
    if (isBot(userAgent)) return botAttributes;
}

The motivation for this change is as follows:

  - Almost all of the attributes are making no sense for a bot and we are
  losing time to match it against the whole DDR.
  - Bot database will be able to evolve independently.
  - We can come up with a single compiled j.u.regex.Pattern to check bots.
  (I am pretty sure Reza knows a lot better performing approaches, but maybe
  for a future release.)

If the development team is ok with that, I want to implement this feature.

Best.


  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message