Black and white engraved illustration. A man in early 1900s clothing stands with his back to us, his hand grasping a large lever. In front of him stands a large mechanical printer. Its sides a collection of sprockets, gears and wheels.

Way back on the 12th of January 2021 Matt Webb over at Interconnected published a post about using printed QR codes for links in books. Give it a read—its a fantastically nerdy and good natured post, and The Gods know we are short on those online these days. It also links to some fantastic experiments in using QR codes in print for footnoted sources.

When reading printed articles and essays it is often a good idea to glance at the source material, if only to make sure it is actually real . Except it is a pain in the thumbs to type out long complicated URLs on your phone and even if you do have a keyboard handy there is the issue of homoglyphs. One solution to this would be to incorporate printed bar codes in your text that the reader could scan with their phone—no typing needed. For some reason this idea stuck with me. Intermeshing the digital and physical.

In the case of Matt he looks at using QR codes. Now, QR codes have always been a bit of a joke—for example the Pictures of People Scanning QR-codes tumblr that has remained empty since 2012 (because no one scans them—ha ha). Though thanks to the MeCard and vCard standards they have seen some success on business cards or posters with contact details (as well as coupons).

With the pandemic the QR code has found wider acceptance as every place of business now seems to have a scan to check in poster to help track people in case of an outbreak. Though China has gone full hog on the QR codes, the rest of the world has followed suit to some degree or another.

So for the past few months I’ve been spending some of my scarce spare time playing around with ideas around this use of a bar coded printable linkology.

The goal is to have:

  1. scannable bar code(s) in printable form that can be printed on a regular home printer,
  2. that contains the footnoted links of the article or book (scan and read),
  3. that does not require a rare/expensive specialized app or hardware past a smartphone capable of scanning the bar code.
  4. that does not rely on a central server or third party online service needed to access the linkology.

After a few days playing around with it I added a fifth guideline. As having a boat load of bar codes in the document looks terrible and requires additional work to scan each code if you want to go through the whole linkology: 5. The reader should scan as few bar codes as possible; meaning we need to keep the size of the data to a minimum in order to cram more into each bar code. This is in part to save on scanning, but also an attempt to keep the intrusive nature of bar codes to a minimum in the written text.

It was a tall order, but I think the following proof of concept document is close (if a bit… hacky).

The bar code alternatives

The Matt Webb post that started all of this focused on the use of QR codes, which are a good choice for this. They are widely adopted, pretty much any smart phone can scan them, and they have proven robust even when damaged.

They are also big, blocky and don’t mesh that well with the text they are supplementing. I think there is an alternative bar code that might just be better for a printed linkology. So if you will pardon me, how about we take a brief side trip down Bar Code alley.

A QR code requires a fair bit of space, including a “border” around it in order to make it scan. Aztec code does not require a border, meaning it can be placed in smaller spaces than a QR code—but both Aztec and QR are square formats and, this may be a surprise to some, most printed materials are rectangular not square. So a rectangular bar code, capable of using the page width without claiming the same amount of page height, would be better.

Data Matrix is an amazing bar code format. It can be printed on the head of a screw (it’s been used to keep track of car parts in factories for quite some time), though we are limited by inkjet technology as far as size and readability goes. It does however supports a rectangular format, but the largest rectangular standard (26x64) only holds 175 alphanumeric characters. Which is not enough for our use.

Enter PDF417, the bar code you never knew you knew. It is being used on ID documents, airplane and train tickets, and as an alternative to postage stamps in many countries around the wold.

In addition to being rectangular and having good error correction, it has one ability which no other bar code has (to my knowledge). As part of the Macro PDF417 standard it can link together upto 99 999 bar codes that, in theory, you could simply swipe your phone/scanner across and have it auto assemble the complete data (though no one would want to scan that many). There are some articles talking about the need for a higher quality printer for printing these at home, but most seem to be talking about the 90s and early 00s home printers. There are also a reasonable selection of android and iOS PDF417 scanner apps available.

During my own tests at home, using my own printer and the free Barcode Factory PDF417 generator and Cognex bar code scanner app I was able to consistently get good reads of a 1200 charcter (using all characters we need for the proof of concept example that comes later) squeezed into a single bar code of 10x0.9cm (that is 4x0.36” for the imperial users out there) sized bar code. Granted, I had equally good results playing around with the length and width of the bar code (as long as I maintained a 1:3 ratio) so the PDF417 could be sized to fit with a set page design. Still, in the interest of keeping it within the limit of older inkjets I’m thinking a limit of ~1000 characters per bar code

Theory and code – or how to squeeze into a 1200 character corset

Now that we have chosen a bar code, the question becomes is there a way to embed a linkology in a slug of text that the phone automatically opens in the browser? The answer being yes, kinda.

The solution comes from a previous (incomplete) project of mine: designing an eZine with images that are embedded in a single HTML file (solution: use base64 encoded images ). This kind of data encoding in a HTML document relies on MIME types. In essence it lets you inform the browser or other piece of software rendering the HTML that the following is a chunk of data conforming to a given standard and therefore must be handled in a given way.

To create a local web page linkology in the browse we can use the text/html MIME standard, creating a string of html that can be fed into the browser address bar and rendered. No server needed, nor base64 encoding—which is good as that would have meant a ~33% size increase of the data. This does have to be manually scanned, copied and pasted into the browser address bar however.

I know it is tempting to just shove all the links into a link shortener, or create a separate page on your personal web page, or perhaps the publishers, to house all the links. But all of these options assumes that the link shorteners will be there 10 years from now, or that your published will not re-organize their site and break the link, or that you will still be paying for your homepage in 20 years. With the current state of link rot this is not something I think would be wise. Besides, it breaks goal No.4 (does not rely on a central server or third party online service).

It is also worth nothing that I’m using MIME types in a not-intended-way—but as you have no doubt guessed by now, we are coloring outside the lines of good design and programming practices here.

So we have a delivery method, but what exactly are we delivering? A basic boilerplate HTML template for a web page looks something like this:

<!doctype html>

<html lang="en">
<head>
  <meta charset="utf-8">

  <title>A fantastic linkology</title>

  <link rel="stylesheet" href="css/styles.css?v=1.0">

</head>

<body>

  <script>
    <!--JavaScript here-->
  </script>
  
  <!--text/html content here-->

</body>
</html>

Except this is a lot of characters for what essentially amounts to an empty web page. Which is where the fantastic pragmatic way web browsers handle HTML comes in. Where as something written in a regular programming language (say C++ or Rust) crashes if you forget a comma, space or include statement, HTML will happily roll on if you forget to include a <header> or <html> tag. In simpler terms: your web browser will try to render a web page even if the underlying code is a garbled broken mess. Granted, it is bad form and absolutely not The Done Thing. But we’re exploring and playing around over here, so let’s ignore good design and standards for now. Which means we will be using the absolute minimum amount required to render a functional page.

Then there is the question of how to encode the links themselves. The straight HTML way of doing it would be:

<a href="URL here">reference number here</a>

Which isn’t very optimized. For each link we include in the linkology we have to repeatedly add the link element <a href=""></a> as well as the protocol https:// and reference number 1,2,3,etc. That means we are using 24+ characters extra per link. By applying some basic compression tricks, such as a dictionary/substitution, and using JavaScript we can reduce the number of repeat characters to leave as much space as possible for the unique characters of each link.

This gives us the following blob of JavaScript:

let l=[]; //array containing all URLs
l.forEach(g);
function g(t, i) {
  i+=1;
  document.write('<a style="font-size:3em;" href="https://'+t+'" target="_blank">['+i+']</a>'); // generates HTML links for each URL in array
}

Which minifies down to a MIME string like so:

data:text/html,<script>let l=[];l.forEach(g);function g(t, i) {i+=1;document.write('<a style="font-size:3em;"href="https://'+t+'"target="_blank">['+i+']</a>');}</script>

The above JavaScript saves characters by making some assumptions:

  • All links will be secure (https://).
  • The links are all added to the array (l=[]) in the order they are numbered in the footnotes.
  • The standard www. is not added as it allows for sub-domains out of the box.

It also has some features added at the cost of additional characters:

  • It includes the CSS slug style="font-size:3em;" which makes the link text large enough that you can comfortably click it on a smart phone screen.
  • The link reference number is bracketed [] making it easier to differentiate between numbers and have a larger surface to click on when selecting a link to follow.
  • By adding target="_blank" any link the reader clicks on will open in a new tab, so they won’t have to press “go back” each time they want to check another link.
  • Using i+=1 rather than i++ we can adjust the starting number of the links so you can divide a large number of links into separate bar codes without having issues with the numbering.

If you were making a linkology referencing different search engines the string would look something like this: data:text/html,<script>let l=["duckduckgo.com","google.com","bing.com","yahoo.com"];l.forEach(g);function g(t,i){i+=1;document.write('<a style="font-size:3em;" href="https://'+t+'">['+i+']</a>')}</script>

(go on, copy paste it into a new tab in your browser and see what you get).

Proof of concept

So we have the bar code and JavaScript code. Now to actually try making something of it. As a marine engineer once told me, in theory there is no difference between theory and practice. In practice there is.

Firstly we need a text with plenty of links on which to experiment—for example the very readable link blog Pluralistic.net by Cory Doctorw which is released under a Creative Commons Attribution 4.0 license. Thank you kindly Cory!

Using the 2021.01.15 entry, minus the images as they may be under a different license and I want to avoid the bot copyright hoardes roaming the Net, I’ve created a proof of concept PDF.

For the proof of concept I decided to try two different methods. Firstly, one PDF417 code per section, and one single page Macro PDF417 mock-up.

The reason I say mock-up is because it is in fact a collection of standard PDF417 codes with a long text string you have to copy/paste and stitch together yourself. Turns out it is easier to get a hold of The Dead Sea Scrolls than finding a free functional Macro PDF417 generator.

Search and you will find a free print ready font (which also has a great explanation). But to make a functional generator requires coding skills I do not have. While the cheapest generator costs 175+ USD (for one license). This also means I don’t know how many, if any, of the plethora of bar code scanning apps on Android and iOS support Macro PDF417 (though a lot support PDF417). There was one online service claiming Macro PDF417 generation, but for the life of me I cannot get it to work.

Though there have JAVA, NET and a few other Macro PDF417 implementations that could be of use, or here. The Gods only know how many shareware, freeware and abandonware PDF417 Bar Code generators I’ve installed in Wine over the last week trying to find one that will work.

On that note, a big thank you to Ed (@hawtgluh) whom followed me down the Macro PDF417 hole and tried his best to find a solution. Thanks man, perhaps next time. Even without the Macro implementation, I think this proof of concept at least demonstrates the idea.

Here is a quick summary showing the number of links and characters, along with the total savings using the two different methods.

Total number of:
links...........................................:    73
characters in those links.......................: 4,751
characters in the "per section" solution........: 5,785 (~21.76% size increase)
characters in the "one page MacroPDF417 mock-up": 3,741 (~21.26% size reduction)

As you can see, even without a functional Macro PDF417 implementation it is possible to create an interactive linkology this way. Though having the ability to scan multiple bar codes together would allow for more data and a nicer layout and design of the resulting linkology page (for example: adding a page title and section titles with an index for easier navigation between sections footnotes).

Post-post (heh) autopsy

Sadly I was unable to meet all of the stated goals. The bar code linkology is printable, contains the footnoted links and does not require a central server or similar service (goal No.1, 2 and 4), and there are plenty of PDF417 reader apps (goal No.3). The sticking point became that 5th goal. The bar codes are quite intrusive, and without a functional Macro PDF417 generator the user has to continuously scan bar codes through out the text. The scanned text also has to be manually copy/pasted into the browser address bar.

It might even be possible to compress the data into a single PDF417 bar code for all I know. Sadly Huffman trees in JavaScript is far beyond my skill set. Alternatively the size of the PDF417 bar codes used for the one page linkology could be of a different size, making it possible to add more characters to them—thereby reducing the number of bar codes the reader has to scan.

All of these things seem to come down to my lack of programming and design skills, so perhaps someone else out there will be able to overcome these limitations. No doubt there is some Java/MIME trickery that would at least solve the automatic opening of URLs issue.

Final thoughts

It was an interesting experiment demonstrating one of the possible ways to mesh print and digital content together. One possible future development of this could be a app that would print online articles or posts, automatically generating Macro PDF417 linkologies on the fly.

With Macro PDF417 technology spread more widely it might even be possible to re-create the gaming magazines of yore where you would get pages of Basic you could type into your computer to get a new game—except this time it is a compressed source file shoved into bar codes. Might make for a fun demo scene or coding challenge.

If you know of a free Macro PDF417 encoder, and/or scanner app, please reach out. I would love to be able to update this post with that info.

For those wondering, the bit of JavaScript code jammed into a MIME construct and stuck to the side of the proof of concept with gum is released under a GPL 3.0 License (Copyright 2021 Cornelius K. of The Infrequency). Enjoy.

Until next time, keep safe.

// Cornelius K.


Title image from Appletons’ cyclopaedia of applied mechanics, vol. 2 (ca.1880) via Old Book Illustrations.

Not that anyone in their right mind would want to use it, but the javascript used for this project is released under a GPL 3.0 License