forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Noels" <stev...@outerthought.org>
Subject RE: Graph data
Date Thu, 21 Feb 2002 12:35:07 GMT
David Crossley wrote:

> I was trying to follow what you guys have been up to
> with this Graph stuff. but have found myself getting a
> little lost. It seems like an awfully complicated process
> just to do some data manipulation. Especially when
> we are in control of both ends of the process and
> everything in-between.

The grouping stylesheet is kind of complicated, but it seems like the right
balance between overdoing it using XSLT and writing PNG's using Perl :-) I like
the idea of having fairly small daily data files (generated with Sam's Perl
script), so that we can produce drilled-down versions, too, when necesarry.
Because the logs are harvested from the raw webserver logs, there always will be
the need to filter the information prior to publishing them.

And personally, I prefer XSLT when feasible, especially when we plan to run from
inside Cocoon, where doing a transformation more or less isn't exactly a
problem, and everybody is able to tweak the stylesheet/pipeline (I for myself
wouldn't like to mess around with Perl).

> Do you have a simple DTD for what you are trying to
> achieve with the data grouping? That would certainly
> help me to conceive it.

No DTD's, but maybe this helps:

We start form daily log files like this:

<?xml version="1.0" encoding="UTF-8"?>
<data group="Downloads" year="2001" week="45" month="11" day="14" dow="3">
  <datum dir="axis" value="19"/>
  <datum dir="batik" value="12"/>
  <datum dir="cocoon" value="4"/>
  <datum dir="cocoon2" value="51"/>
  <datum dir="crimson" value="75"/>
  <datum dir="fop" value="176"/>
  <datum dir="soap" value="338"/>
  <datum dir="xalan-c" value="10"/>
  <datum dir="xalan-j" value="74"/>
  <datum dir="xang" value="5"/>
  <datum dir="xerces-c" value="251"/>
  <datum dir="xerces-j" value="979"/>
  <datum dir="xerces-p" value="33"/>
</data>

with variations in the group/year/week/month/day/day-of-week/dir/value
attributes.

For easier grouping afterwards, these files are reshuffled using
add-data-attrs.xsl:

<?xml version="1.0" encoding="UTF-8"?>
<data group="Downloads" year="2001" week="45" month="11" day="14" dow="3">
  <datum dir="axis" value="19" group="Downloads" year="2001" week="45"
month="11" day="14" dow="3"/>
  <datum dir="batik" value="12" group="Downloads" year="2001" week="45"
month="11" day="14" dow="3"/>
  <datum dir="cocoon" value="4" group="Downloads" year="2001" week="45"
month="11" day="14" dow="3"/>
  <datum dir="cocoon2" value="51" group="Downloads" year="2001" week="45"
month="11" day="14" dow="3"/>
  <datum dir="crimson" value="75" group="Downloads" year="2001" week="45"
month="11" day="14" dow="3"/>
  <datum dir="fop" value="176" group="Downloads" year="2001" week="45"
month="11" day="14" dow="3"/>
  <datum dir="soap" value="338" group="Downloads" year="2001" week="45"
month="11" day="14" dow="3"/>
  <datum dir="xalan-c" value="10" group="Downloads" year="2001" week="45"
month="11" day="14" dow="3"/>
  <datum dir="xalan-j" value="74" group="Downloads" year="2001" week="45"
month="11" day="14" dow="3"/>
  <datum dir="xang" value="5" group="Downloads" year="2001" week="45" month="11"
day="14" dow="3"/>
  <datum dir="xerces-c" value="251" group="Downloads" year="2001" week="45"
month="11" day="14" dow="3"/>
  <datum dir="xerces-j" value="979" group="Downloads" year="2001" week="45"
month="11" day="14" dow="3"/>
  <datum dir="xerces-p" value="33" group="Downloads" year="2001" week="45"
month="11" day="14" dow="3"/>
</data>

This happens only once (i.e. the result of this transformation is cached).

Then, a DirectoryGenerator is used to produce this XML file list:

<?xml version="1.0" encoding="UTF-8"?>
<dir:directory name="forrest" lastModified="1014124674820" date="dinsdag,
19/02/2002 2:17:54 PM" requested="true"
xmlns:dir="http://apache.org/cocoon/directory/2.0">
	<dir:file name="2001-11-02.xml" lastModified="1014106426202" date="dinsdag,
19/02/2002 9:13:46 AM"/>
	<dir:file name="2001-11-03.xml" lastModified="1014106510834" date="dinsdag,
19/02/2002 9:15:10 AM"/><dir:file name="2001-11-04.xml"
lastModified="1014106514599" date="dinsdag, 19/02/2002 9:15:14 AM"/>
	<dir:file name="2001-11-05.xml" lastModified="1014106518074" date="dinsdag,
19/02/2002 9:15:18 AM"/>
	<dir:file name="2001-11-06.xml" lastModified="1014106521569" date="dinsdag,
19/02/2002 9:15:21 AM"/><dir:file name="2001-11-07.xml"
	[...]

This filelist is transformed to an XML doc which X/Cincludes all the daily logs
into one big document:

<?xml version="1.0" encoding="UTF-8"?>
<list xmlns:cinclude="http://apache.org/cocoon/include/1.0"
xmlns:dir="http://apache.org/cocoon/directory/2.0">
	<cinclude:include src="cocoon:/forrest/data/2001-11-02.xml"/>
	<cinclude:include src="cocoon:/forrest/data/2001-11-03.xml"/>
	<cinclude:include src="cocoon:/forrest/data/2001-11-04.xml"/>
	<cinclude:include src="cocoon:/forrest/data/2001-11-05.xml"/>
	<cinclude:include src="cocoon:/forrest/data/2001-11-06.xml"/>
	<cinclude:include src="cocoon:/forrest/data/2001-11-07.xml"/>
	[...]

Next, this document is run through (in my local case) the CIncludeTransformer,
which produces:

<?xml version="1.0" encoding="UTF-8"?>
<list xmlns:cinclude="http://apache.org/cocoon/include/1.0"
xmlns:dir="http://apache.org/cocoon/directory/2.0">
  <data group="Downloads" year="2001" week="43" month="11" day="02" dow="5">
    <datum dir="axis" value="74" group="Downloads" year="2001" week="43"
month="11" day="02" dow="5"/>
    <datum dir="batik" value="87" group="Downloads" year="2001" week="43"
month="11" day="02" dow="5"/>
    <datum dir="cocoon" value="36" group="Downloads" year="2001" week="43"
month="11" day="02" dow="5"/>
    <datum dir="cocoon2" value="39" group="Downloads" year="2001" week="43"
month="11" day="02" dow="5"/>
    <datum dir="crimson" value="116" group="Downloads" year="2001" week="43"
month="11" day="02" dow="5"/>
    <datum dir="fop" value="241" group="Downloads" year="2001" week="43"
month="11" day="02" dow="5"/>
  [...]

Which is piped through an almost identity transformation getting rid of the data
elements (unwrapping):

<?xml version="1.0" encoding="UTF-8"?>
<list xmlns:cinclude="http://apache.org/cocoon/include/1.0"
xmlns:dir="http://apache.org/cocoon/directory/2.0">
  <datum dir="axis" value="74" group="Downloads" year="2001" week="43"
month="11" day="02" dow="5"/>
  <datum dir="batik" value="87" group="Downloads" year="2001" week="43"
month="11" day="02" dow="5"/>
  <datum dir="cocoon" value="36" group="Downloads" year="2001" week="43"
month="11" day="02" dow="5"/>
  <datum dir="cocoon2" value="39" group="Downloads" year="2001" week="43"
month="11" day="02" dow="5"/>

Which is then run through the infamous grouping stylesheet, which groups per
grouptype/week/project:

<?xml version="1.0" encoding="UTF-8"?>
<graph>
  <data group="Downloads">
    <datum total="3441" week="2001-43">
      <datum value="90" dir="axis"/>
      <datum value="91" dir="batik"/>
      <datum value="39" dir="cocoon"/>
      [...]
    <datum total="11432" week="2001-44">
      <datum value="148" dir="axis"/>
    [...]
  <data group="Another group?">
[...]

So yes, there's plenty of steps just to regroup the daily logs into weekly
ones... :-|

1) The XIncludeTransformer does XPointer, which eliminates the unwrapping step,
but is not Cacheable :-( Having a Cacheable one would mean one step less.
2) Sam said he could do the reshuffling of data attributes to the datum elements
in its Perl script: yet another stylesheet less.

Keep in mind that the raw logs *are* in daily format, so somebody has to do the
processing anyway (and weekly logs seem like a reasonable granularity). From
what I overheard on the xsl-list, the grouping approach we are using is the
correct one for this case.

> Also, where are you getting your data from? I see
> some at
> www.apache.org/~rubys/stats/xml.apache.org/
>
> Am i missing something?

Nope, that is where I got the sample data.

</Steven>


Mime
View raw message