hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Misty Stanley-Jones <mstanleyjo...@cloudera.com>
Subject Building the site - interesting problem
Date Thu, 11 Sep 2014 03:58:30 GMT
Hi all,

The way the site has been built for a while poses a problem I'm not sure
how to solve. I'd like your input.

Currently, the site is stored in a SVN repo. What happens is that we
generate the site from the git repo sources and then copy the output over
the top of the svn repo, svn add new files, and svn update.

This causes some problems. The biggest problem is that if files become
irrelevant (we remove a class or something, or remove a webpage, or
something like that), there is actually no way to delete it from svn,
because we don't start over with a fresh copy of the site each time.

At first glance, it seems like an easy thing to fix. You could use an rsync
job and just delete the ones that are not present in the generated source.
But there are some things in there that are not generated anymore (such as
0.94 API docs) or at least not generated by running the site goal on

So I need a way to figure out what files are truly stale and need to be
deleted from svn, and which need to be left there. One strategy I thought
of trying is to try to crawl the website starting from the front page and
see all of the files that are reachable from there. The ones that are not,
probably should be deleted.

To that end, I am currently pulling down the site using wget, and I'll
compare that to the contents in trunk and see what's different. But I'd
like advice for what we can do about this in the future, since pulling down
the site with wget takes ages.

I'll update when I figure out more about it.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message