welcome!
this is a document intended to demonstrate the range of features common throughout the e-texts in the project gutenberg library, and indeed to the majority of printed books.
project gutenberg is a volunteer effort for digitizing the text of public-domain books, for viewing and distribution in cyberspace.
it was begun by michael hart back in 1971, with the goal of creating 10,000 e-texts, a milestone achieved in 2003, thanks to a big boost from distributed proofreaders, which allows people to proofread online -- thousands of them doing a page at a time, volunteering bits and pieces of their time.
if you want to support the p.g. library with software, a markup system, or so on, you should be able to handle its features, and you can use this file as a "test-suite" to verify that your system is fully capable.
this document should be self-explanatory. tabs have been substituted with "~tab~", so that they will become visible to you, so they could be changed back for your testing. other than that, no changes should be needed.
this is the test of a link in the middle of a normal paragraph, to http://www.pgdp.net, to see if it works correctly...
if you find inconsistencies in this test-suite, do please let me know immediately. thank you.
first, you should be able to handle headings of different levels, such as the book, chapters, and subsections.
you may label the levels as you like.
html can support 6 different levels, so that's a good number to shoot for.
one of the things that users find handy is a table of contents for the e-book, so you must be capable of generating one, in cases where an e-text doesn't have one.
because of their experience with the web, people often expect this table of contents to be hotlinked to the appropriate sections, so your markup system should facilitate that. a nice touch is then to have chapter headings then link back to the table of contents...
project gutenberg was born a very long time ago, before word-processors and personal computers... a rumor is that michael used a keypunch machine (it's ok if you're too young to know what that is) to enter a good number of the original e-texts...
computers didn't even have lowercase characters in the early days, so the whole book was capitalized! luckily, before long we got lowercase characters.
but still, "luxuries" like italicized and bold text were not possible, so michael developed a convention where a word that was bold or italics in the original was entered in all-uppercase, to show that emphasis.
because the e-texts are stored as raw ascii text, that convention lives on, to this day, in some files. by this time, however, we need to be able to handle styled text, so your systems must be able to do so.
most english e-texts in the library can be represented in the lower-ascii characters, but future e-texts are likely to require some unicode characters, so you should without question be able to handle unicode.
many of the e-texts contain poetry, or verse of some type, so your system must be able to handle silliness like that.
xxxxxxsome poems want to be left-justified, so you should be able to handle that:
a haiku for you
(by bowerbird intelligentleman)
haiku have three lines
and seventeen syllables
five, seven, and five
other poems want to be centered instead:
t.v. will eat you
(by bowerbird intelligentleman)
t.v. will eat you
out of a satellite dish
with a tuning fork
and some poems want to alternate...
six spaces at the start of this line
12 spaces at the start of this line
six spaces at the start of this line
12 spaces at the start of this line
six spaces at the start of this line
12 spaces at the start of this line
six spaces at the start of this line
12 spaces at the start of this line
and some poems want to get fweaky!
six spaces at the start of this line
ten spaces at the start of this line
14 spaces at the start of this line
18 spaces at the start of this line
22 spaces at the start of this line
26 spaces at the start of this line
22 spaces at the start of this line
18 spaces at the start of this line
14 spaces at the start of this line
ten spaces at the start of this line
six spaces at the start of this line
in general, lines of a poem prefer to stay together, that is, to be kept all on a page whenever possible, so your system should attempt to accomplish that...
if it's not possible to keep the whole poem on a page, try to make the page-break occur between the verses...
there aren't a whole lot of tables in the e-texts -- we're talking literature, not spreadsheets -- but your system should handle tables anyway; not really big and hairy ones, just simple ones.
table 1 | column 1 | column 2 |
plain-text | yes | yes |
x.m.l. | no | yes |
html | yes | no |
.rtf | no | yes |
no | no |
~tab~~tab~ center me please! ~tab~~tab~
xxxxxxsometimes, for one reason or another, some of an e-text's lines are centered. so your system should be able to do that.
~tab~~tab~ center me as well, please! ~tab~~tab~
most of the p.g. e-texts are text-only. but some of them do have pictures, so your system must be able to show 'em.
put a picture here, or maybe a button that someone could click in order to view that picture...
"what is the use of a book," thought alice, "without pictures or conversation?"
xxxxxxsome of the e-texts have footnotes.[^^^^^^^1^^^^^^^]
your system must be able to handle them. how it might do that is up to you, captain.
remember how, in chapter 2, we said that the table of contents should be hot-linked to the appropriate spots?
that is one type of link you'll need. there are several other types as well.
your system should also be able to make the jump to an internet site. most of the e-texts are quite old, so of course it's not like they have a bunch of internet url's in them; but every e-text will indeed contain a link to project gutenberg's website, so you must be able to execute links...
quite often there are places in an e-text that reference other parts of the e-text. in these cases, it's nice to have a hotlink close to (or on) that reference point that transports the reader directly to the place that is being referenced; it is convenient. your system should facilitate such linking, preferably making it happen automatically.
for instance, the beginning of this chapter has a reference to chapter 2. if a reader clicked on those words -- "chapter 2" -- they should automatically go to chapter 2.
(and likewise with each of the references to "chapter 2" here in this paragraph too.)
i use a hyphen between "e" and "text" in "e-text". not everyone does, but i think that it looks nicer.
a hyphen -- as you know -- differs from a dash. and you probably know that there are even two (and some people say more!) types of dashes...
the first - called an "en-dash" - is a narrow one. you will see these in a fair number of the e-texts. it's called an "en" dash because it was traditionally defined as being exactly as wide as the letter "n". (or, some say, as wide as a letter "n" is high, so you can take your pick between those choices.)
the second -- called an "em-dash" -- is wider, and yes, it's called that because it's as wide as an "m", or so the story goes, according to some people...
generally, try to use an em-dash, not an en-dash... the en-dash looks too much like a hyphen, especially when it is run into the words that are surrounding it.
now, the convention says that you should not have spaces on the sides of a dash. the convention is wrong. it looks much nicer if you put spaces around a dash.
perhaps even more importantly, the search capability of many programs is thrown off if you don't use spaces.
xxxxxxso are the re-margination routines in many programs, so -- to avoid these problems -- put spaces around dashes.
a problem arises, though, because there is no em-dash in the lower-ascii codes. so you have to use a double-dash -- like these here -- for an em-dash. ok, problem solved. your system should be able to convert the double-dash into a proper em-dash, if the user chooses that option.
hyphenation is another thing that messes up e-book search capabilities. e-books don't need hyphenation. so turn hyphenation off when you make an e-book.
back in the old typewriter days, students were instructed to put two spaces after a sentence.
ever since wordprocessing, though, some people have said two spaces are no longer required, that it is an unnecessary leftover from earlier times.
those people were wrong. if you have one space after a period, sentences run together too much.
but...
the thing is, it's actually a lot easier to edit text if you only have one space after a period... that way, you can do a search for two spaces, and that search should come up totally empty.
thus, to make life easier on the writers out there, your software should create the smidgen of space necessary to separate two sentences sufficiently.
xxxxxxso, if you're making an e-book, use just one space.
xxxxxxsometimes you want to quote a whole block of stuff from someone. this is often called a "block-quote". clever, the guy who came up with that name...
many of the project gutenberg e-texts contain block-quotes of one various type or another.
here's an example of a block-quote, a letter.
~tab~dear leslie,
~tab~
~tab~how are you? i am fine.
~tab~the weather is nice here.
~tab~but i wish it was half
~tab~as beautiful as you are.
~tab~
~tab~and i wish you were here.
~tab~
~tab~love,
~tab~bowerbird
typically, block-quotes are indented on both the left and right sides.
here's another block-quote, from a speech.
> four score and seven years ago, our
> forefathers set forth upon this continent
> a new nation, conceived in liberty and
> dedicated to the proposition that
> all men[^^^^^^^2^^^^^^^] are created equal.
there are a number of different situations throughout the e-texts that might call for this type of indentation. for now, we will just subsume them all under "block-quote"; perhaps later we will see fit to break out a more finely-grained analysis, if we find any special cases merit their own class.