commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Rall <...@finemaltcoding.com>
Subject Re: [commons-dev] mail no longer archived
Date Thu, 08 Aug 2002 20:18:15 GMT
"Craig R. McClanahan" <craigmcc@apache.org> writes:

> On 6 Aug 2002, Daniel Rall wrote:
> 
> > Date: 06 Aug 2002 15:08:50 -0700
> > From: Daniel Rall <dlr@finemaltcoding.com>
> > Reply-To: Jakarta Commons Developers List <commons-dev@jakarta.apache.org>
> > To: Jakarta Commons Developers List <commons-dev@jakarta.apache.org>,
> >      Michael A. Smith <mas@apache.org>
> > Cc: pier@betaversion.org
> > Subject: Re: [commons-dev] mail no longer archived
> >
> > Trolling commons-dev for relevant messages is like trying to find a
> > needle in a haystack.
> >
> > "Michael A. Smith" <mas@apache.org> writes:
> >
> > > robert burrell donkin wrote:
> > > > could eyebrowse on nagoya cope with the traffic if we included it's
> > > > urls instead?
> > >
> > > I'm probably not the right one to ask about that (I believe Daniel
> > > Rall is),  but I would think that using our own archive would be
> > > better than relying on an external archive.
> >
> > I agree.  Eyebrowse still being enhanced (currently mostly by Andreas
> > Guenther).  Cross list threading improvements and misc. buglet fixes
> > are in CVS, and ready to be pushed live.
> >
> > > Daniel, since I believe you're the eyebrowse maintainer, can you
> > > answer this?
> >
> > Using tigris.org and various other deployments as a metric, I say it
> > will to scale.  However, I've noticed that the deployment on
> > archives.apache.org (nagoya) runs slowly in comparison.  I've wondered
> > if the fact that it is behind mod_proxy has anything to do with this
> > (but don't have any hard data on this subject).  When whatever is
> > bottlenecking archives.apache.org is be found (and subsequently
> > fixed), promoting it as the main list archive browser makes a lot of
> > sense.  Pier, do you have any clues here?
> 
> The only slowness I've ever really noticed on nagoya.apache.org/eyebrowse/
> is building the initial page -- it looks like it's doing a complex
> database query for the message counts or something.  After that, it seems
> pretty responsive.

The initial page you speak of, ViewLists, is in fact doing a join
which can involve a massive number of rows.  The SourceCast
instantiations don't use this screen, as lists are associated with
individual projects, so only a small set of lists need their
information retrieved at a time (rather than dumping the info for
every list at once).

Here's the Java/SQL for the query from EyebroweDB:

  private final Query List_getAll = new Query(
    "SELECT eb_List.*, COUNT(eb_Message.listId) as count "
    + " FROM eb_List LEFT JOIN eb_Message on eb_Message.listId=eb_List.listId "
    + " AND eb_Message.isDeleted = 0 "
    + " WHERE "
    + IGNORE_REMOVED_MESSAGES
    + " GROUP BY eb_List.listId "
    + " ORDER BY eb_List.listName ");

The eb_Message table on nagoya is fairly large:

mysql> select count(*) from eb_Message;
+----------+
| count(*) |
+----------+
|   429832 |
+----------+
1 row in set (0.00 sec)

The join of this data set with another table is expensive:

mysql> SELECT eb_List.*, COUNT(eb_Message.listId) as count
    -> FROM eb_List LEFT JOIN eb_Message ON
    -> eb_Message.listId=eb_List.listId
    -> AND eb_Message.isDeleted = 0
    -> WHERE
    -> (eb_Message.msgNo is NULL OR eb_Message.msgNo != -1
    ->  AND eb_Message.isDeleted = 0)
    -> GROUP BY eb_List.listId
    -> ORDER BY eb_List.listName;
...
104 rows in set (26.91 sec)

This query alone is taking 27 seconds!  Nice catch, Craig.  MySQL's
EXPLAIN command <http://www.mysql.com/doc/en/EXPLAIN.html> shows us
what's going on:

mysql> EXPLAIN ...;
+------------+------+---------------+---------+---------+----------------+------+---------------------------------+
| table      | type | possible_keys | key     | key_len | ref            | rows | Extra  
                        |
+------------+------+---------------+---------+---------+----------------+------+---------------------------------+
| eb_List    | ALL  | NULL          | NULL    |    NULL | NULL           |  104 | Using temporary;
Using filesort |
| eb_Message | ref  | msgDate,msgNo | msgDate |       4 | eb_List.listId | 4298 | where used
                     |
+------------+------+---------------+---------+---------+----------------+------+---------------------------------+
2 rows in set (0.21 sec)

The ALL type for eb_List means a full table scan is occurring.
Generally, that's very, very bad, but in this case it's exactly what
we asked for (and the data set is small, at only 104 rows).  Other
than trying multiple queries to avoid the temporary table (the
filesort can't really be avoided unless we want to do the list name
alphetization in Java, and doesn't take an appreciable amount of time
anyhow), a way to optimize the query isn't immediately apparent to me.

Finding a better way to display the mailing list data -- like
tigris.org does, for instance -- seems much more likely to yield a
large performance payoff.
-- 

Daniel Rall <dlr@finemaltcoding.com>

--
To unsubscribe, e-mail:   <mailto:commons-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:commons-dev-help@jakarta.apache.org>


Mime
View raw message