You're about to create a document on the web -- maybe an article like this one. Do you just start editing
foo.html? Consider making a directory called
There are several compelling reasons why web documents -- what one normally calls web ``pages'' -- should be implemented as directories:
.../foo/or even just plain
.html) or Windows (
.htm), or which of the dozens of dynamic content-management systems you're using (
.xml, and so on).
foo/secure in the knowledge that if you change your file format or your content management system, they don't have to change their links. For that matter, neither do you -- you can make a huge global change and not have to worry that hundreds of links to
.htmlfiles have to get changed because you're now using
lynxusers (many of whom are blind), ordinary users, Microsoft IE victims, RSS feed readers, PDF for printers, and document authors, using a site-wide
index.cgiscript to sort them out dynamically.
.fr, to your filenames and delivers the right one according to the browser's language preference.
I have to confess that I'm not particularly consistent about following this principle, in spite of the fact that I've known for years that it's the right thing to do. I also don't get enough exercise or eat enough fruit. But I'm getting better.
It's useful to have a script or template that makes it easy to create a document directory the way you like it. Heck, it's useful even if you don't use a separate directory for each document.
Using a script makes it easy to customize the boilerplate elements like
Makefile and the document's navigation
`breadcrumbs' and overall structure. The next document in this series,
Managing Websites, will have more
The only important decision you need to make is whether to use the
index.html file for your document, or call it
something else (like, for example,
and either using your
.htaccess file or a symbolic link to
make it the document that the server returns.
There are pros and cons on both sides. Using unique names makes life
easier if you're using an editor like
emacs that lets you
keep multiple documents open at once: you're not looking at a list of a
index.html files and trying to figure out
which one is which. And there's no real problem naming the document after
On the other hand, using
index.html for everything makes life
a lot simpler for scripts, and makes it particularly easy to distinguish
the main document from any auxiliary notes, comments, and examples. It
also makes it easy to navigate if you use a graphical browser like
Nautilus or the Macintosh
As for me, I've used both methods. I'm currently leaning toward the
index.html side because it's easy to script, and doesn't rely
on having a way to upload symbolic links or support for
.htaccess files on the hosting site. The really nice thing
about using a separate directory for each document, though, is that you
can change your mind later, give all your
different names, and nobody but you will ever know.
If you want to make it easy to distinguish between directories that represent documents and directories that represent collections of documents there are two easy ways to do it:
Linux) and lowercase names for documents (e.g.,
docs-are-directories. This lets you look at a directory listing and tell at a glance which subdirectories are documents. It also means that collections usually sort ahead of documents in an alphabetical listing.
index.htmlfor collections and
document.htmlfor documents. You can do this with the following Apache configuration directive:
DirectoryIndex index.html document.html
I almost invariably use the first method.
If your site is big enough to have a collection of documents you've written, it's probably already organized into directories. There are four main ways to get information up to your site:
rsync -e ssh --archive --cvs-exclude . user@site:/pathThe
--cvs-excludeparameter keeps stuff like editor backup files, log files, and so on from getting copied; add
--updateif you need to keep files created on the server (for example, by users) from getting clobbered, although
cvsmight be better in that case. I usually have a
syncthat does this.
rsynchas the advantage that it transfers only the files, or parts of files, that have changed.
makefor many tasks around a website, including offline formatting and building multiple versions. Used for uploading, you can be more selective than you can with
rsync, and you can use any program you like for the actual upload (for years I used
ftp, until it fell out of favor at my ISP for security reasons).
cvs(or some other version control system, such as SubVersion)
cvswants to run on the server and pull files from your repository. This relies on support from your ISP or hosting service, which you don't always have. But it's particularly good if you have shell access on the server, since you can make emergency changes on the spot and check them in. It's also great if you allow other people to make comments (blog style) or edits (wiki style) directly on the site.
Despite the obvious advantages, there some cases where you probably shouldn't use document directories:
ls *.htmland get a listing of all the HTML documents in your directory.
mancommand and the GNU
infosystem. Sometimes it's easiest to just go along.