hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Misty Stanley-Jones <mstanleyjo...@cloudera.com>
Subject Re: Building the site - interesting problem
Date Fri, 12 Sep 2014 00:53:25 GMT
Yes, I have done some exploring. I think the first step is to get rid of
the files that are currently stale.

Images that are not referenced by any file and not present in the crawled
site:
images/apache-maven-project-2.png
images/architecture.gif
images/bc_l2_buckets.png
images/bg.jpg
images/big_h_logo.png
images/big_h_logo.svg
images/hadoop-logo.jpg
images/hbase_logo.svg
images/jumping-orca_rotated.png
images/jumping-orca_rotated.xcf
images/jumping-orca_rotated_12percent.png
images/logo_apache.jpg
images/logo_maven.jpg
images/maven-logo-2.gif

Directories that are present in the SVN site that are not present in the
generated site or the crawled site and don't seem to be archival. They seem
to be from an earlier version of the book which put each chapter into its
own directory.

community
configuration
cp
developer
external_apis
getting_started
hbase-client
hbase-common
hbase-examples
hbase-hadoop-compat
hbase-hadoop1-compat
hbase-it
hbase-prefix-tree
hbase-protocol
hbase-server
ops_mgt
performance
preface
rpc
schema_design
security
shell
tracing
troubleshooting
upgrading
zookeeper


The next issues is that xref-test/, which is generated as part of the
build, does not seem to be reachable from the main site (it was not
crawled). Is that a problem?

I'll commit these changes today and work on a script to cleanly refresh the
site.


On Thu, Sep 11, 2014 at 1:25 PM, Nick Dimiduk <ndimiduk@gmail.com> wrote:

> I think rsync is a good approach. It may take a bit, but we can work out
> the correct --excludes list, so that 0.94, svn dot-files, and whatever else
> are preserved. Did you explore this?
>
> On Wed, Sep 10, 2014 at 8:58 PM, Misty Stanley-Jones <
> mstanleyjones@cloudera.com> wrote:
>
> > Hi all,
> >
> > The way the site has been built for a while poses a problem I'm not sure
> > how to solve. I'd like your input.
> >
> > Currently, the site is stored in a SVN repo. What happens is that we
> > generate the site from the git repo sources and then copy the output over
> > the top of the svn repo, svn add new files, and svn update.
> >
> > This causes some problems. The biggest problem is that if files become
> > irrelevant (we remove a class or something, or remove a webpage, or
> > something like that), there is actually no way to delete it from svn,
> > because we don't start over with a fresh copy of the site each time.
> >
> > At first glance, it seems like an easy thing to fix. You could use an
> rsync
> > job and just delete the ones that are not present in the generated
> source.
> > But there are some things in there that are not generated anymore (such
> as
> > 0.94 API docs) or at least not generated by running the site goal on
> > master.
> >
> > So I need a way to figure out what files are truly stale and need to be
> > deleted from svn, and which need to be left there. One strategy I thought
> > of trying is to try to crawl the website starting from the front page and
> > see all of the files that are reachable from there. The ones that are
> not,
> > probably should be deleted.
> >
> > To that end, I am currently pulling down the site using wget, and I'll
> > compare that to the contents in trunk and see what's different. But I'd
> > like advice for what we can do about this in the future, since pulling
> down
> > the site with wget takes ages.
> >
> > I'll update when I figure out more about it.
> >
> > Thanks,
> > Misty
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message