Trouble shooters .Com® and Web Workmanship Present:
Validating and Debugging HTML and CSS
Copyright © 2022 by Steve Litt
See the Troubleshooters.Com Bookstore.
CONTENTS
The current major version of HTML, as of 5/20/2022, is HTML5. Some facts about HTML5 as well as the previous versions:
About Browser Rendering:
The ::firstchar pseudo-element is only about a decade old, so it's not supported by every browser. Therefore, to some extent compatibility with ::firstchar could be used as a proxy for being a well-behaved browser. The following is the result of tests I did on multiple Linux based web browsers to test the ::firstchar CSS pseudo element, which perhaps shed some light on these browsers' commitment to compliance.:
This document includes nothing about SEO (Search Engine Optimization) tags such as description, keywords, the meta title, etc. However, if you do put those tags in, everything said in this documents applies to them too.
The rest of this document elaborates on validating and debugging HTML, CSS and Javascript...
I use HTML5. It's the latest major version, and it's been out eight years now. No reason to use anything earlier.
I know, I know
There are those who say HTML 4.01 was a specification whereas HTML5 is a bunch of suggestions. OK fine, for sure for sure, but HTML5 gives you a native way to show videos instead of relying on Adobe Flash. Remember what a pain Flash was, and how your visitor had to have it installed?. HTML5 introduced many semantic tags that, while not necessary to the sighted visitor, make viewing much easier for the blind person using a screen reader. HTML5 is vastly superior to any of its predecessors, so I repeat, no reason to use anything earlier.
The HTML5 specification allows, but does not mandate, HTML5 to be well-formed XML. Making HTML5 well-formed XML is primarily the act of putting either closing tags, or ending tags with a forward slash. Which of these you do in what circumstances is discussed later in this section.
The reason I make my HTML5 well formed XML is because it's much easier to check. If it's well formed XML, the HTML will probably be valid or close to valid. And well-formed XML is easier for humans to grasp at a glance than equivalently complex non-XML but valid HTML5.
I've created a very fast and simple Python program to check an XML file for being well formed. This program, called xmlchecker.py, is detailed in the next section.
It could be asked why Tim Berners-Lee didn't use an XML dialect for the World Wide Web he invented, instead of making the HTML standard. The answer is simple, When Berners-Lee invented HTML, XML didn't exist, and was three years in the future.
XML requires every element to have a start tag and an end tag, or a trailing slash to end itself. The following are examples of each:
Elements actually containing content require technique #1, but for elements not containing content, XML doesn't care which way you end your elements. But HTML does. The HTML rule is this: End any container elements with a distinct end tag, but end any non-container elements with a trailing slash. In other words, XML gives you a choice for elements not containing anything, whether they're container elements or not. But HTML5 requires every container to end with a separate end tag, regardless of whether it actually contains anything. A partial list of container type elements, which require distinct end tags, follows:
html, head, body, title, script, div, ul, ol, li, and span.
A partial list of non-container type elements, which require end slashes and not end tags, follows:
meta, link, p, br, hr, img, and a.
Care must be taken to distinguish between contents of an container element and attributes of a container element. In the non-container tag <img src="bell.png" alt="A bell,note the self-ending tag"/>, understand that src and alt are attributes, not contents. Contents always appear between distinct start and end tags, not as metadata for the start tag. Another reminder: An element's contents can be almost anything, but its attributes are strictly defined by the HTML standard.
Once you've properly closed every element, you're ready to run xmlchecker.py on your file...
If you write your HTML5 to be well-formed XML, and in fact it is well formed XML, it's probably going to render similarly everywhere, even if it's not valid HTML5. This section explains how to determine if it's well formed XML, and what to do if it isn't.
First, let's introduce the xmlchecker.py code, which is pretty simple. You don't need to understand it: It's included here so you can copy and paste it from your browser. Unlike this web page, it's Free Software, Expat license:
#!/usr/bin/python3 # Copyright (C) 2017, 2022 by Steve Litt # Expat license: https://directory.fsf.org/wiki/License:Expat import sys import re import xml.etree.ElementTree as ET ERRCOUNT=0 def string_to_enable_special_chars(): st='' st+='<!DOCTYPE html PUBLIC ' st+='"-//W3C//DTD XHTML 1.0 Transitional//EN"' st+=' "http://www.w3.org/TR/xhtml1/' st+='DTD/xhtml1-transitional.dtd"' st+=' [' st+='<!ENTITY nbsp \' \'>' st+='<!ENTITY reg \'®\'>' st+='<!ENTITY copy \'©\'>' st+='<!ENTITY trade \'™\'>' st+=']>' return st def printit(thestring): f=sys.stdout print(thestring, file=f) def disclaimer(fname): printit('\n=======================================') st='Disclaimer: This program replaced file {}\'s' printit(st.format(fname)) printit('<!DOCTYPE > line with a special html5') printit('DOCTYPE line while evaluating. The original') printit('file has not been changed. It\'s possible') printit('this program might be inaccurate if the') printit('original file had a non-html5 DOCTYPE line.') printit('=======================================\n') def abort_on_crazy_doctype(thestring): printit('') st='This file\'s doctype line is: {}' printit(st.format(thestring)) st='The XML Checker program works ' st+='only on doctypes that:' printit(st) printit(' Begin with "<!doctype", case irrelevant') printit(' Have exactly one ">" character') printit(' That ">" character is at the end') printit('This error might not indicate mal-formed XML,') printit('It might just mean that the XML Checker program') printit('can\'t be run on this file with its current') printit('<!DOCTYPE line.') printit('') sys.exit(1) def eligible(thestring): st = thestring.strip() if not re.match('<!doctype html', st, re.I): return True elif st.count('>') != 1: abort_on_crazy_doctype(thestring) elif st[-1] != '>': abort_on_crazy_doctype(thestring) else: return False def file2string(prepender, fname): outstring = prepender + "\n" inf = open(fname, "r") inlines = inf.readlines() for line in inlines: if eligible(line): outstring += line return outstring fname = sys.argv[1] print('\nTesting for well formedness {} ...\n'.format(fname)) htmlstring=file2string(string_to_enable_special_chars(), fname) try: tree = ET.fromstring(htmlstring) except ET.ParseError as err: (line, col) = err.position code = str(err.code) printit('ERROR: {} !!!!!!'.format(str(err))) ERRCOUNT += 1 disclaimer(fname) sys.exit(1) else: st='CONGRATULATIONS: {} is well formed XML'.format(fname) printit(st.format(fname) + '!!!!!!') disclaimer(fname)
Once you've copied it to a file on your computer, you can run it on a web page, let's say mypage.html, with the following simple command:
xmlchecker.py mypage.html
I've seen the preceding command come back with one of four messages every time:
# 1 means your page is well formed XML, so far so good. #2 gives a line number and what entity it considers invalid, so it's trivial to fix. #3 gives the line number, so it's usually fairly easy to spot. If your lines are super long, there's no law against breaking lines where spaces are now, so as to narrow your search, and then rerunning xmlchecker.py. #4 is the tough one. How to diagnose the #4 type of problem is discussed later in this section.
The following is what it looks like when your XML is completely well formed:
[slitt@mydesk web]$ xmlchecker.py mypage.html
Testing for well formedness validating.htm ...
CONGRATULATIONS: myfile.htm is well formed XML!!!!!!
=======================================
Disclaimer: This program replaced file validating.htm's
<!DOCTYPE > line with a special html5
DOCTYPE line while evaluating. The original
file has not been changed. It's possible
this program might be inaccurate if the
original file had a non-html5 DOCTYPE line.
=======================================
[slitt@mydesk web]$
The preceding result occurs when your HTML5 is well formed XML. If there's an error, you'll see the "CONGRATULATIONS" line replaced by one of the error messages in the list of four possible messages listed several paragraphs ago.
Contemplate the following xmlchecker.py error message:
ERROR: mismatched tag: line 347, column 6 !!!!!!
The preceding is a #4 type problem, a mismatched tag. The only help xmlchecker.py has given you is it's told you something's wrong. It's an idiot light.
Mismatched tags in an XML file occur when there's an extra closing tag not matching an opening tag, an extra opening tag not matching a closing tag (which is perfectly OK with an HTML5 validator), and interwoven tags, which are really a special case of an extra opening link. The following is an example of interwoven tags:
<div><span></div></span>
Interwoven tags are less difficult because the W3C validator will specifically point them out. If not for that, interwoven would be awful to diagnose. If for some reason you don't have access to the W3C validator, you'll need to troubleshoot interwoven tags as an extra opening tag.
But first, let's discuss the case of an extra closing tag. These are easy, because xmlchecker.py gives the line number of that extra closing tag, so you can remove it, move it, or add an opening tag somewhere before it: Whatever is appropriate.
The following subsection discusses the diagnosis of extra opening tags, which is a much more challenging activity...
Extra opening tags are difficult because the line number from the xmlchecker.py error message doesn't point to the extra opening tag. Instead, it points to the closing tag of whatever container contains the extra opening tag. So, for instance, if the extra opening tag is a direct child of the <body> element, the error message line points to the </body> tag. This could be tens or hundreds of lines down from the extra opening tag.
The best way to troubleshoot this is to use the fact that the error message points to the closing tag of the parent of the extra open tag. As a starting point, read the line number on the error message and see what closing tag it points to. Whatever closing tag that is, it's the closing tag for the extra opening tag's parent. So add a div element spanning the opening tag's parent, all the way to the line before the closing tag pointed out by the error message.
In other words, if xmlchecker.py points to a <div> tag on line 886, put </div> <!-- Ends div.diag --> on line 885; one line above that parent's closing tag. Now find the parent's opening tag and put <div class="diag"> one line below that opening tag. Now the xmlchecker.py should name the line number of the closing tag of the new div element you just put in.
NOTE:
Very often the extra opening tag is a direct child of the body element. In such a case, the opening tag of your new diagnostic element should be the line below <body> and the new diagnostic element's closing tag should be the line above </body>.
TIP:
It can be very handy to give your new div.diag element a different background color, for easy identification when viewing with a browser. For web pages with white or very light gray backgrounds, light yellow is an excellent choice. The CSS follows:
If your background isn't white or very light gray, try to use something of similar brightness but very different coloration. For instance, if your background is black, you might set the background color for div.diag at #660000.
After inserting your div.diag element to span the inside borders of the direct parent of the extra opening tag, whose closing tag is identified by the error message from xmlchecker.py, run xmlchecker.py again and make sure the error message line now identifies the closing tag of the div.diag element you just added. If so, you're ready to start narrowing the scope of the extra opening tag.
Now move the new div's closing tag roughly half way up toward the new div's opening tag, and see if the error message's line number still points to the new div's line number. If it does, this means the extra opening tag is still a direct child of the new div. So move it up another half the way to the new div's opening tag. However, if the error message line number reverts to pointing to the original closing tag, it means that the extra opening tag is no longer between the new div's opening and closing tags, so you'll move the closing tag half way to the last place you had it. Keep doing this until you find a line or small group of lines that toggle the error message's line number between the new div's closing tag and the original closing tag, and look carefully through there. You'll find it.
DANGER!
When placing the closing tag for the new element, make sure you NEVER put the end tag inside a container element that's a descendant of the new div. If you inadvertently violate this rule, you create an interwoven tag situation that wasn't there before, and massively complicate and delay your diagnostic effort.
NOTE:
XML files often have multiple issues. xmlchecker.py finds only the first one, so you might need to repeat the diagnostic div insertion procedure multiple times.
As mentioned, in the case of extra start tags, xmlchecker.py error messages name the line number of the end tag of the extra start tag's parent, which often means the </body> tag. Or, if you're the kind of person who inserts one or more divs to cover the entire body, the line number names the line number of the end tag of the innermost of those whole-body divs. So very often, you'll be starting at the very bottom and moving on up.
Ill formed XML in lists, and especially multilevel lists, are extremely hard to diagnose. Particularly nasty are multilevel lists. Do your best, and if you really have trouble, copy the list to a dummy HTML5/XML file and troubleshoot it there. You know you have a list problem if the error message points to the bottom of the list, or if the line number points to the new diagnostic div end tag below the list but points somewhere else when you put that end tag above the list.
It's especially important to make all your lists well formed XML, because if they're not, the browser is likely to render them with an inaccurate hierarchy, and it's very possible that different browsers will render the list with different hierarchies.
It's possible to repeatedly comment out ranges of the file and then run xmlchecker.py to troubleshoot ill-formed XML. I've done this, it's not pleasant, and I don't recommend it. But if all else fails, it's a tool you can use.
Errors are much harder to find with multiple root causes, and multiple root causes are exactly what tend to happen when the author goes too long between runs of xmlchecker.py. Also, errors are much easier to find if their root cause is in code you wrote during the last few minutes, which is another reason to check early and often.
When I first start a web page from a template or by copying another web page, the first thing I do is to run xmlchecker.py, so I know I'm starting off with valid XML. Just as one example, meta tags lacking the closing slash are fairly easy to find in a tiny template file, but can take 20 minutes of troubleshooting in a 1000 line file.
Whew, reading this section was tough, and performing it can be tough. But it solves some very tough problems. When it comes to structural problems in your HTML, making your HTML also be well formed XML and checking for that well formedness shuts down a lot of problems that are hard to catch otherwise, including perfectly valid HTML that portrays your intended web material inaccurately. If xmlchecker.py says your HTML is also well formed XML, your web page will probably render just the way you want it to, identically, on all competent web browsers.
You can use the W3C HTML and CSS validator two ways: In the "cloud", or on your hard disk. By doing it "in the cloud" I mean going to https://validator.w3.org/, and either using its file manager to find and upload your file, or copying your file's contents and pasting them onto the validator page. Either of these methods has two disadvantages:
Therefore, I recommend running the W3C Validator on your hard disk. The rest of this section exclusively addresses running on your hard disk.
Before you can use the W3C validator, you must download and install it. Downloading and installing it presents a heck of a challenge.
If I wanted to build software that's extremely difficult to download and install, and then write documentation so bad it would confuse people even further, I couldn't do a better job of this than W3C has with their validator. Their docs are so bad that when I tried to re-install a week after first installing, I couldn't find the right documentation on downloading or installing. Great software, disgustingly vague, breezy, and unhelpful docs. Hopefully I can make it a little simpler for you.
Judging from the docs, there are several ways to do this stuff: All of them umm, shall I say, exotic. Maybe it's just me. Maybe everyone else is familiar with Jigsaw, Apache Ant, Tomcat, Jetty, servlets, and the ServletWrapper class.
Stop the madness! Ignore the docs, do it the easy way. Here are my instructions:
The validator is now installed. The following list shows some examples of how you test html and CSS files, on your hard disk and on the web:
This method is as easy as Download and Install the Easy Way, except good luck finding the right url from which to download, and which link to click, while wading through all the documentation on w3c.org.
A week ago I discovered (from the meandering and voluminous documentation) a way to do it with git clone https://github.com/w3c/css-validator, followed by a bunch of commands, to create vnu.jar. I tried to recreate that process today and couldn't find the documentation again. Finding the right documentation involves a lot of time and trial and error. Do yourself a favor and follow the instructions in Download and Install the Easy Way
The xmlchecker.py section described how to use xmlchecker.py to eliminate structuring problems and maximize consistency across browsers. What xmlchecker.py doesn't do is check your document for adherence to HTML standards. To check such adherence, you need the W3C validator.
Assuming you installed the validator as described in the preceding section, you can check a file by running validate_html_css.sh on the local file or the web URL of the HTML or CSS document to check. If the file is an HTML file, it checks the HTML and its contained CSS. If the file is a CSS file, it checks the CSS.
Note:
When you use the validator on various websites, you'll notice that almost all websites give lots and lots of errors. Most web authors just don't care: They get paid either way. Unfortunately, this applies to all my previous Troubleshooters.Com web pages. On all new construction I'll validate my HTML.
If you want the best possible browser portability, I'd advise you to W3C validate your web page, on your hard disk, after it passes xmlchecker.py. W3C validating is easy to do once the validator is installed, and validating can help you undo a lot of bad habits that we all have learned during the years.
Like it or not, most web pages today are viewed on mobile devices. So you need feedback on how mobile friendly your web page is.
WARNING:
The Google Mobile-Friendly Test requires you to upload or point to your web page, or dump its contents into the Google Mobile-Friendly Test. Don't do this if your page content or code is a secret, or if you're worried about having your page a permanent part of Google's data, and how Google's Terms of Service allow them to make use of your page.
Like Google's, most mobile-friendly testers are "in the cloud" and therefore pose the same problems. There appear to be commercial, proprietary tester programs you can download and use (for a cost), but I couldn't research them. One possibility, although remote, is to use the "retired" W3C Mobile Checker, a program you compile and run on your computer. Its latest updates are 5 years old as of 5/24/2022, and some of its comments indicate failure. But if you want to try, the git is at https://github.com/w3c/Mobile-Checker.
The simplest and most tolerant/permissive sources of mobile-friendly evaluation is the Google Mobile-Friendly Test. It's an important test, because when page ranking, Google gives quite a bit of extra consideration to what it things are mobile-friendly pages.
Other documents in the Web Workmanship subsite detail how to make your web pages mobile-friendly.
It's happened to us all. We're authoring along, making great progress, and not paying attention to the output. Maybe not taking a detailed look at the output for a few days. Then, while reading the web page, we find something the wrong color, or wrong size, or we find whole paragraphs with the wrong style, or our multilevel lists are subtly different from our intentions, or some of our <pre> material has lost its intended indentation. Now comes the fun, as we figure out why. It can take a long, long time.
Or perhaps a document that's always looked OK starts having problems after an update to the browser. We start debugging the problem, and the more we debug, the worse the problems get and the more confused we get. Personally, when I get to this state of affairs, I take a day out of my busy life and rewrite it, this time with full XML well-formedness checking and HTML5 validation.
And don't forget situations in which every browser renders your page differently. Nobody in 2022 whats to put a "best viewed in" notice on their work, nor include a bunch of hacks to accommodate out-of-tune browsers. With fully checked and validated HTML5, your page renders pretty much the same in any browser.
WARNING:
The preceding sentence is false if you've done things like defined the exact typeface of the text. Why anybody would assume the visitor has a particular typeface is beyond me.
Life is so much simpler when we check XML well-formedness every 15 minutes (the test takes less than a second) and HTML5 validity every hour (the test takes less than 10 seconds.). If something's wrong, it can easily be fixed simply by looking at the latest code written. Such checked and validated code never gets bitrot, and is always welcoming even after months or years of not working with it. Every minute spent checking for well-formed XML and HTML5 validity saves an hour or more of misery when things go wrong.
This subject was covered in the Xmlchecker.py section.
First of all, make sure in every CSS statement's declaration block (the part between the curly braces), every key is followed by a colon, and every value is followed by a semicolon. Fail to do this, and some or a lot of your CSS silently fails. So you need to check all CSS for the proper colons and semicolons. If you're an attentive person you can do this by eyeballing it, otherwise, use the validate_html_css script, described in the Install W3C Validator section. The validator, as invoked in this shellscript, tests for CSS right along with the HTML, and catches missing colons and semicolons in CSS declaration blocks.
Next, another simple typo or error that can cause CSS havoc and frustration. Make sure the class name in the element is the same class name in the CSS. If it isn't, you can work on the CSS all day long and wonder why nothing you do has an effect. Check and double check that the names are the same.
Moving to a less obvious cause of CSS failure, order matters in CSS. If your CSS isn't doing the right things, check the order of your CSS statements. Generally speaking, imported, general purpose CSS files should be imported first, and local CSS should be declared later. Typically you can debug these type of errors by experimentally reordering CSS and seeing the effects.
Some of the toughest to solve CSS problems stem from specificity. These are beyond the scope of this document, but you can learn more about them by web searching "css specificity".
Now let's discuss the problems created by the !important rule, which can cause havoc. This rule prevents later CSS from modifying earlier CSS marked !important. So when your CSS fails to change the appearance of elements it's styled for, after you've eliminated all other possible causes, check for earlier !important rules.
Because I suggest you never use !important, I won't detail what it is or how to use it. If you don't personally use !important, the one other way it can bite you is if you use a tool, such as Bootstrap, whose CSS includes !important rules. To overrule Bootstrap's !important rules, you have to include your own, and the clutter just grows and grows. This is one reason I don't use Bootstrap.
Javascript is a real programming language, so debugging it is for the most part beyond the scope of this document. Unfortunately, in a browser, most Javascript errors fail silently; they run OK but don't produce the desired effect. So this section simply gives a few hints.
First this hint: For some people, most Javascript errors are caused by forgetting to put a semicolon at the end of a statement. So check your work for statements not ending in semicolons.
Another tip is to view your web page in the developer tools of a browser such as Chromium. Errors and warnings are clearly visible, on the side, the moment they occur, making it much, much, MUCH easier to debug your Javascript.
If all else fails, you can use alert() statements to see if your Javascript executed that far, and to tell the value of a variable or two.
Beyond these hints, debugging Javascript is better learned in other documents.
This document has covered a lot of ground. Thanks for reading it.
Validation tests benefit the HTML/CSS web author immensely, because if done early and often they keep your page most-browser functional, and they keep your page rendering as you intended. Validated pages render almost identically on competent browsers, whereas pages with lots of validation errors are rendered very differently on different browsers.
HTML has a very quirky syntax that's hard to get right. XML has a syntax that almost guarantees that your page renders as you intended. In fact, a page consisting of well-formed XML is likely to render correctly on a variety of browsers, even if its HTML is not completely valid. A second benefit is that XML well-formedness is quick and easy to test via a Python program included earlier in this web page. Third benefit: Well formed XML is fairly easy to debug if wrong, using a <div class="diag">> </div>> pair, as explained earlier in this web page. Fortunately the HTML5 specification allows you to write HTML5 in well formed XML.
You can either run the W3C validator online, or you can install it on your computer, as outlined earlier on this page. Installing it gives you the benefits of quicker results, ability to put in shellscripts or Powershell scripts, and ability to keep your code off the Internet.
Speaking of keeping your code off the Internet, if you don't care if Google sees your code, you can use the Google Mobile-Friendly Test. This test gives you a pretty good idea of whether your page is mobile-friendly enough that Google will give it a higher ranking because of its mobile-friendliness, as well as giving you an idea how easy it will be for your users to work with.