markup

from HTYP, the free directory anyone can edit if they can prove to me that they're not a spambot
Jump to navigation Jump to search

Related Pages

101 Lecture Notes

plaintext and formatting

A bit of history:

In the beginning, there was the word, and the word was 8 bits.

(A "bit" being kind of like the subatomic particle of the computing world. All data is made of bits. A bit is a thing which can be either on or off -- one or zero -- and not anything else. It can also be used as a digit, but in base 2... okay, this is probably getting too far into the weeds; the TLDR is that all information computers use is made up of bits.)

...and all letters and punctuation had to be represented using those 8 bits.

So the Gods of Computing created ASCII, and saw that it was good.

ASCII is a convention by which specific patterns of 8 bits -- which we usually represent as numbers, just because bit-patterns are not very human-compatible -- are understood to refer to specific characters on the keyboard (including a few "special" characters like carriage-return and tab, which are arguably formatting operations).

(Aside: so, like, historically, sometimes the top bit of the 8 was needed for parity-checking in serial communications, so traditional ASCII actually only uses 7 bits -- 128 possible characters. MS DOS used all 8 bits because IBM PC displays weren't serial terminals, which had been the case for pretty much everything before that, but basically internal electronics that the computer could talk to directly -- so no need to worry about parity-checking for comms. MS DOS therefore defined another 128 characters which allowed text-mode DOS to do things like use text to draw lines, and also have accents on some Roman characters and a few Greek characters and other miscellany. You don't actually need to know this in order to understand the rest, but I often find it helpful to explain how things got the way they are.)

So anyway... this convention, ASCII, became so widely used that pretty much everything understands it now. (There was at least one other standard, EBCDIC, but fortunately we didn't end up with a Betamax/VHS Ultimate Showdown of Ultimate Destiny there.)

...and when we say "plain text", we're pretty much talking about ASCII.

...or, that is, the convention of a particular character set that Everything Understands.

(In reality, these days it's often encoded using 16-bit or even 32-bit character sets, mumble mumble Unicode UTF-8 mumble mumble no I don't entirely understand it either.)

markup

Then it came to pass on the Second Day of the Computer Age, the Gods of Computing created the word-processor.

And it was... okay.

So, like, there were text-only word processors at first, but I'm thinking more of things like WordStar, which let you do boldfacing and underlining (and, if you had a dot-matrix printer and the necessary drivers, italics).

But the screen couldn't show italics.

Depending on what kind of terminal you were using, it might have been able to kinda fake boldface by making the text brighter, but I don't think I ever ran into one that could do italics. (Maybe the VT240; it did double-height text... I don't quire remember; it's been awhile since 1987, apparently...)

...and anyway, all that information had to be saved to disk somehow. In 8 bits, or better yet 7 bits just in case.

(...because if you tried to use a dial-up modem to transmit your document to someone else for proofreading or typesetting or whatever, and you used 8 bits but the modem connection was only 7 bits, your document could easily end up getting mangled.)

So they needed a way to represent things like boldface and italics and other stuff, but using only those standard 127 characters.

...and that's what text-markup is!

A student asks:

Ok ok... So plain-text/ASCII is this almost universal way if making 127-8 characters look fancy or sit a particular way on a page, without needing to add another 127-8 buttons to your keyboard for each different desired instruction?

Yes, something along those lines

I mean, if you were going to have a key for every way that every character might appear, you'd need at least 4 for each character, just to cover all possible combinations of italics and bold... and there's also underlining, and super/subscript... and we haven't even gotten into things like links, images, tables, bullet-lists... etc.

"Markup" is any way of indicating all of those things just using the standard characters.

So somewhere in the mix, there's software which takes the markup and turns it into nicely-formatted text on your screen (or printed on a page), and that process is called "rendering".

For HTML, as an example, the rendering is done by a web browser. It takes HTML (among other things) and turns it into, liek, a web page! You can even think of a web browser as a "file viewer" for HTML, though it gets a bit more complicated than that because HTML can insert content from other files, so you're not really looking at just one file... but that's getting off the point.

Markdown

Now, Markdown... is a particular type of markup, i.e. a particular markup language.

A student asks whether the "change view" function in WordPress uses Markdown.

I'm not super-familiar with WordPress, but I suspect it's allowing you to switch between a rendered view and the raw HTML.

A student asks:

So plain-text/ASCII are a set of characters/buttons... And markup is when you use a particular way of using those characters to format text. Markup is what word processing software does mostly, HTML is when you use it in a web browser? And now Markdown is when you use markup for doing something different again, when you need to have text that multiple people can edit and track the changes they make?

With word processors, sometimes you never really see the markup at all. Early versions of MS Word (and probably WordPerfect) used binary data formats that only a computer could love -- not human-editable, in any practical sense. RTF was just a save-as option; it wasn't used by default.

HTML is a particular kind of markup... ok, let me give some examples.

Here are the words "bold" and "italic", marked up to actually be bold and italic, in different markup languages:

  • Markdown: **bold**, *Italic*
  • HTML: <b>bold</b>, <i>italic</i>
  • Textile: *bold*, _italic_
    • I think "Textile" is the right name -- can't confirm the name in the time available...