Diligent Editing of HTML

by paddyjackson

I am a fan of standards ie XHTML Transitional/Strict etc. To this end I do try to make sure that I am keeping my own sites reasonably compliant. Sites I do commercially are always 100% compliant but thats because I insist on it and they have placed their trust in me.
Just recently I have had to convert a really bad site to XHTML Transitional and if you had seen the markup you would have realized how big this task was. To go through it by hand would have been an enormous task and quite frankly I would have been unable to do it at the price I quoted without the following tools:
1. Vim ( Braam Moolenaar )
2. Template Toolkit TT2 ( Andy Wardley )
3. HTML Tidy (Dave Ragget)
4. W3C Validator ( The W3C Validator Team )
The first tool (Vim) could really be any good text editor ie Emacs, ed, or any of the vi children. I just happen to use Vim and once you have learned the basics joy to use and makes editing text almost an art.
TT2! the second tool is slightly more specialized and less well known but just as easy to use, but it deserves a big mention. TT2 is a templating system. Most people won’t really understand or even need to know what the advantages of this is until they need to edit a 10+ page website and hate it when someone wants to change a font on some item on all the pages. This could of course be done using server side includes or some other method but TT makes this easy but also exposes a programmatic API which make its functionality and versatility as wide as the programmers skills. This only scratches the surface of what TT can actually do for you.
The third tool is Dave Raggets HTML Tidy. This one tools is what saved me from going stark raving mad this weekend. Visually selecting an area in vim and then
‘<,’>!tidy -asxhtml -icbq -wrap 100
was what kept me sane. This single command will take ANY html fragment and sanitize it for you. It adds a lot of guff that you may not want but you can remove that and you have a sanitized version complete with CSS.
I just wanted the formatting, indenting and validation. I weeded out the CSS and I was left with a nice plain HTML document that I was then able to understand rather than some debauchery of a mess the devil would not have started with.
Using Tidy this way is a great way to get a clear place to start when converting a messy HTML page.
Last but not least is the W3C’s validator pages for both CSS and XHTML. After all the grunt work is over its time to check the pages and using the methods above I managed to come in with:
Out of 29 Pages:
20 html errors
2 css errors
this took me about 30 minutes to fix!