db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <mse...@segel.com>
Subject Re: I need advice for Table schema Design.
Date Mon, 28 Nov 2005 14:11:06 GMT
On Monday 28 November 2005 03:01, Oyvind.Bakksjo@sun.com wrote:
> Legolas Woodland wrote:
> > Hi
> > Thank you for reading my post.
> > I need your advice for a design/query problem.
> > Scenario :
> > Imagine that i should design a system which should be able log and
> > process view / hit over  banners.
> > So , in some portion of time there is a high rate of click / view over
> > one banner , this could be fake because web site owner can produce fake
> > traffic on his/site to earn more money.
> I think this is a rather complex problem to solve, as this kind of fraud
> has become more sophisticated. There was a presentation about this at
> the VLDB conference this year, I have included links to the paper and
> the presentation. Hope that helps a little.
> http://www.vldb2005.org/program/paper/tue/p169-metwally.pdf
> http://www.vldb2005.org/program/slides/tue/s169-metwally.ppt

Uhm this isn't rocket science. ...

Its not that complex of an issue.
Perform a structural decomposition of the problem.

1) Rephrase of the problem:
You maintain a website that has commercial advertisements. Your goal is to 
identify potential fraudulent actions. 

2) Determine constraints or data available for your capture..
	1) Location of the PC or source of the click through. (IP address, 
potentially the MAC address of the PC too.
	2) Id/tag of the advertisement being clicked on
	3) timestamp of the click.
	4) Don't rely on cookies

3) Simple base table creation
4) Summary table creation (also known as roll ups.)
5) JMS vs MQ type of transaction delivery.
(I don't believe that JMS guarantees delivery. MQ may be too slow.)
[Suggest that you tee off the feed and write to a flat file (rotation on a 
daily basis with a 1 week retention]
Then use JMS to deliver transaction record to DB.
At end of day, perform a row count on all the transactions that had a time 
stamp of today and check with flat file. If the row count doesn't match, find 
out why. then synch up. (First potential source of error...)

After row count, then create entries in your daily summary table. Depending on 
day of the week, you will want to do the same and update your weekly and then 
monthly summary tables.

Every night, you'll want to run your statistical analysis and generate reports 
looking for trends.

Now note, if you want to get more involved, you can track your sources by the 
country or location of the PCs hitting your site. You may be able to even 
track the city.

You'll want to then look out for trends....

Notice that what I propose is the collection of the raw data of the click 
through. You may also want to capture more data, like the page that contained 
the ad for instance. I did not go in to the type of analysis which is where 
some of this can get complicated. (PID anyone?)

Note: No unique indexes. You'll want one index to be (advertisement_id, 
client_ip, client_mac, timestamp). This should get you started.

Depending on the amount of traffic your site gets, you'll want to then rename 
the data source table (daily/weekly/monthly) and retain for your records.
(This will help to keep the actual daily table size to a minimum.)

Note: This will help you to identify the sources, however there is a way that 
you can still get scammed. If someone has created a zombie army of PCs, it 
would be trivial for them to create a small program that connects to your 
server and does at least one click through.  You'll then have an issue of 
selecting the back scatter from your true "customer".

Note: This is not the complete solution. Just enough to point you in the 
correct direction.


But hey, what do I know? 
Its not like I've done this or similar things before and I just thought this 
out while I was still drinking my first cup o joe. ;-)


Michael Segel
312 952- 8175 [M]

View raw message