Automated Website Maintenance

It's all about separating style from content. Crucial, that is. If you do that, then you can change the look of the whole site by changing one file, and you can edit your content with a minimum of presentation-related crud obscuring the view. You can also do arguably useful stuff like skinnable websites.

Of course, it's obvious from that blue border that style's not my thing, and there's not much content here either. I could say meta at this point, but honestly I feel less like a computer scientist every day and I can't be bothered.

This page will describe how I've tried to set up this site to be maintainable, and maybe some of the other stuff it uses for message boards and whatever. Basically, the idea is to set things up so that the computer does all the donkey work (what computers are for), and the author can focus on spouting, ranting, and ego expansion (what websites are for). The computer can do donkey-work at any of three stages: static processing, dynamic processing, and client-side processing. Client side stuff (javascript) is for nifty effects like pointer trails, and less nifty effects such as scrolling status bar messages. It's got not a lot to do with maintainable websites, so I won't talk about that.

Other people have similar pages about similar things, among them Bob Hepple, Matt J. Gumbley, and Ralf S. Engelschall.

If you want to know more about how I've done this, and you have a matrix account, you can always poke around in ~hdenman/website to see what's going on.

Static procesing

Static processing (for which I use m4) happens once, when you build your website. For example, all my hrefs have mouseover handlers to put some explanatory text in the status bar (at the bottom of the window). It would be tiresome to have to put the javascript in every time I'm typing a link; the code for a link like Matrix might look like:

<a href="http://matrix.netsoc.tcd.ie" 
onMouseOver="window.status='Matrix, Netsoc\'s members\' server'; return true;"
onMouseOut="window.status=''; return true;">
Matrix</a>

So instead, I use an m4 macro. When I edit my webpages, I insert a link with hd_link([[http://matrix.netosc.tcd.ie/]],[[Matrix]], [[Matrix, Netsoc's members' server]]). When I run make, m4 replaces that macro with the necessary, escaping quotes where needed.

Here's another example: sometimes, a footnote seems like a worthwhile thing to include, perhaps explaining what a bananafish is¹
Why, I've known some bananafish to swim into a banana hole and eat as many as seventy-eight bananas. More about bananafish.
close
. These are generated by another macro; that one just there was generated by

hd_note([[]],[[What's a bananafish?]],
[[Why, I've known some bananafish to swim into a banana hole and eat as many as seventy-eight bananas.
hd_link([[http://www.geocities.com/Vienna/3597/A_Perfect_Day_For_Bananafish.html]],
[[More about bananafish]],[[Is this abridged?]])]])

Such definition bits would make a great tool for educational websites; you could develop a stylesheet that had an unobtrusive link class, and automatically add mouseover definitions for every word in a glossary, so when someone is reading something and they've forgotten what MCMC
Markov Chain Monte Carlo, find out more here.
close
is, they can click on any occurrence and have a definition pop up to remind them. Or it could be a mouseover. Bit bandwidth-heavy, especially if naively implemented, but bandwidth is getting cheaper, and latency's what you want to avoid (latency would be a problem if you hrefed every occurence to a glossary page).

When doing a didactic bit like this, you don't want to have to go escaping your html every time you want to quote a macro; that's donkey work. So I have a macro to escape HTML as well. Other useful ones define a variable containing the contents of a file, and one to include an M4 file with out too much mangling (in fairness, m4 is not really meant for processing m4 files and confusion can easily result if you try to use it for same).

Here's an up-to-date view of the m4 file, globals.inc, that defines all those macros (included automatically using the hd_includeM4 macro). It may be worth bearing in mind that m4 is harder to read than it is to write.


 
divert(-1)
changequote([[,]])
define([[_NoteNumber]], 0)

define([[hd_link]], 
[[define([[_escaped]],patsubst([[[[$3]]]],[[']],[[\\']]))define([[_escaped_url]],patsubst([[[[$1]]]],[[']],[[\\']]))
<a href="$1" onMouseOver="window.status='_escaped (_escaped_url)'; return true;"
onMouseOut="window.status=''; return true;">$2</a>]])

define([[hd_note]],
[[define([[_escaped]],patsubst([[[[$2]]]],[[']],[[\\']]))<a href="javascript:show('note[[]]_NoteNumber')" 
onMouseOver="window.status='_escaped'; return true;"
onMouseOut="window.status=''; return true;">ifelse($1,[[]],[[&#185;]],$1)</a> <span 
id="note[[]]_NoteNumber" class="myStyle"><table width=400 bgcolor="#49DAEF"><tr><td>$3</td></tr><tr><td align="center">
<a href="javascript:hide('note[[]]_NoteNumber')">close</a>
</td></tr></table></span>[[]]define([[_NoteNumber]],incr(_NoteNumber))]])

define([[hd_escapeHTML]],
[[ patsubst(patsubst(patsubst([[[[[[[[$1]]]]]]]],[[&]],[[&amp;]]), [[<]],[[&lt;]]), [[>]],[[&gt;]])
]])

define([[hd_defineFile]], [[divert(-1)
dnl **** Not thread safe! don't use parallel make! ***
changequote(`,') 
syscmd(`echo "define($1, [[[[" > tmpHDm4_12765') 
syscmd(`cat $2 >> tmpHDm4_12765') 
syscmd(`echo "]]]])" >> tmpHDm4_12765') 
syscmd(`perl -pi -e "s#\\\$'`'`\@#\]\]\\$\[\[\]\]\[\[\@#g" tmpHDm4_12765')
changequote([[,]]) 
include([[tmpHDm4_12765]]) 
syscmd([[rm tmpHDm4_12765]]) 
divert]])

define([[hd_includeM4]],
[[hd_defineFile([[_tmptmptmp]], $1)
hd_escapeHTML(_tmptmptmp($[[]]1,$[[]]2,$[[]]3,$[[]]4,$[[]]5, $[[]]6, $[[]]7))
undefine([[_tmptmptmp]])]])

divert



The bottom of every page on this site has a comments area, powered by php. The php code that generates the message board is the same for every page. Similarly, the html that generates that blue border is the same for every page. So all the pages are generated from the same template. The template when the website was last made looked like this:

 
 
include(`globals.inc')
<html>
<head>
<title>hd_title</title>
<link rel="stylesheet" href="style.css" type="text/css">
<SCRIPT TYPE="text/javascript" LANGUAGE="JavaScript"><!--
function show(object) {
if (document.layers && document.layers[object] != null)
{
	if (document.layers[object].left + document.layers[object].width > window.InnerWidth)
		document.layers[object].left -= window.InnerWidth - (document.layers[object].left + document.layers[object].width)
	document.layers[object].visibility = 'visible';
}
else if (document.all)
{
	myObj=document.all[object]
	if (myObj.offsetLeft + myObj.offsetWidth > document.body.clientWidth-10)
		myObj.style.posLeft = myObj.offsetLeft - (-document.body.clientWidth +10 + (myObj.offsetLeft + myObj.offsetWidth))
	document.all[object].style.visibility = 'visible';

}
}
function hide(object) {
 if (document.layers && document.layers[object] != null)
    document.layers[object].visibility = 'hidden';
else if (document.all)
document.all[object].style.visibility = 'hidden';
	}
	//--></SCRIPT>

	<STYLE TYPE="text/css"><!--
	.myStyle {
	    position: absolute;
	visibility: hidden;
	}
	//--></STYLE>
</head>

<body marginwidth=0 marginheight=0 leftmargin=0 topmargin=0 bgcolor="#ffffff">
include([[header.inc]])
include(hd_content)
include([[footer.inc]])
</body>
</html>



The template's pretty straightforward. First the file globals.inc is included. This defines my macros, such as the hd_link one mentioned above. Then comes some html common to all pages. Then the header.inc is included; this sets up the top part of the table (the table arranges that blue border). Next comes the actual content, a file contained in the hd_content variable. Usually this file includes messages.inc, which sets up the message board. Then the footer file (the end of the layout table), and then the end of the html. As I said, the idea is to get the computer to do the donkey work; replication is donkey-work so rather than cutting and pasting layouts from page to page and what-not, let the computer replicate it for you.

The time is perhaps ripe for a big picture overview. Here's how the whole thing works: I write a webpage with a .webm4 extension. This contains content, macros and no layout---well, maybe some internal layout (and it does contain markup, bold, italic, headers etc). Then I use the make program (actually gmake on matrix, to distinguish gnu make (better documentation) from OpenBSD make) to turn all the .webm4 files into .php files, and copy them into my www directory. The makefile that generates the site currently looks like this:


 
pages := $(patsubst %.webm4,%.php,$(wildcard *.webm4))
incs  := template $(wildcard *.inc)

website : $(pages)
	cp *.php *.css /home/hdenman/public_html/

%.php : %.webm4 $(incs)
	m4 -Dhd_content=$< -Dhd_title="$(shell head -1 $< | perl -ne 'm/\s*\<\s*[hH].+?\>(.+?)\</; print "$$1\n";')" < template > $@




Here's how it works. First, the line pages := $(patsubst %.webm4,%.php,$(wildcard *.webm4)) defines a variable containing what we want to make. We want to make a .php file for every .webm4 file in the website directory (.webm4 is the extension I made up for my pages before they've been processed). The $(wildcard *.webm4) bit generates a list of all webm4 files. Then the patsubst replaces all occurences of .webm4 with .php. So if I have files websites.webm4, index.webm4, books.webm4, the variable pages will have websites.php, index.php, books.php, which are the files we want to make (cause that's the website).

File that end in .inc are files designed to be included in other files (another extension chosen arbitrarily). The line incs := $(wildcard *.inc) generates a variable containing the name of all files ending in .inc. We need to know this cause if a .inc file is changed the whole site has to be rebuilt.

The next bit is a makefile rule, that tells the make program how to make the thing called 'website'. The syntax website : $(pages) indicates before we can make website, we have to make sure all the files listed in the pages variable are up-to-date - in Makefile jargon, they are prerequisites. The next line says that if all the prerequisites are up-to-date, 'website' can be made by copying all .php and .css files into my www directory. The two lines together are an example of a simple Makefile rule. website is the target, $(pages) contains the prerequisites, and the next line is a command.

The final part of the makefile is another rule. It says that any file ending in .php depends on a file with the same name with a .webm4 extension, and on all the include files. For example, websites.php (this page) depends on websites.webm4 and on all the include files. The command tells make how to generate a .php file: you need to know that the variable $< stands for the first prerequisite (in the example juist given, that would be websites.webm4, and $@ stands for the target (websites.php). The command line for websites.php would end up looking like m4 -Dhd_content=websites.webm4 < template > websites.php, which runs the m4 macro processor with the variable hd_content defined to be websites.webm4, using the file template as input and storing the output in websites.php.

The whole point of make programs is to automatically detect when a prerequisite has been changed and to generate any dependent targets. So when I run gmake, it checks that all my .webm4 files have a corresponding .php file and that the .php file is newer than all the .inc files and the .webm4 file. I edit the .webm4 file and run gmake and everything happens by magic.

Dynamic processing

Dynamic processing is what languages like php and asp do. The difference between dynamic and static processing is that while in static processing, the macro expansions etc. are done once, in dynamic processing the server runs through expansions etc. every time a page is requested. So while you can do all the stuff described above using php or asp, and many people, especially non-unix people, do, I think it's a waste of processing power to do it for every request.

What php and the like are really good for is pulling stuff out of a database and putting it on a webpage. They can also take stuff from a form on a webpage and put it into a database. By these powers combined, you can do bulletin boards, blogs, content-management systems, webwide collaboration, and all sorts of good interactive stuff.

Anyway, there's so much stuff out there on dynamic processing. A good place to start is Jas' php page. Webmonkey heaven is at IRT.

Security considerations

Described nicely here and here.

If you like security, you might find this more interesting than poring over bugtraq archives.


Comments welcome...