Day 2: Santa and the Rakupod Wranglers

Santa's world was increasingly going high-tech, and his IT department was polishing off its new process that could take the millions of letters received from boys and girls around the world, scan them into digital form with state-of-the-art optical character recognition hardware, and produce outputs that could greatly streamline the Santa Corporation's production for Christmas delivery.

One problem had initially stymied them, but consultants from the Raku community came to their aid. (As you may recall, IT had become primarily a Raku shop because of the power of the language for all their programming needs ranging from shop management to long-range planning.) The problem was converting the digital output from the OCR hardware to final PDF products for the factories and toy makers. The growing influence of Github and its Github-flavored Markdown format had resulted in IT's post-OCR software converting the text into that format.

That was fine for initial use for production planning, but for archival purposes it lacked the capability to provide textual hints to create beautiful digital documents for permanent storage. The Raku consultants suggested converting the Markdown to Rakupod which has as much potential expressive, typesetting power as Donald Knuth's TeX and its descendants (e.g., Leslie Lamport's LaTex, ConTeXt, and XeTeX). As opposed to those formats, the Rakupod product is much easier to scan visually and, although current Raku products are in the early stages of development, the existing Rakupod-to-PDF process can be retroactively improved by modifying the existing Rakupod when future products are improved.

Side note: Conversion between various digital document formats has been a fruitful arena for academics as well as developers. Raku already has excellent converters from Rakupod to:

  • Text - Pod::To::Text

  • Markdown - Markdown::Grammar

  • HTML - Pod::To::HTML

Other non-Raku converters include Pandoc and Sphinx which strive to be universal converters with varying degrees of fidelity depending upon the input or output formats chosen. (However, this author has not been able to find an output format in that set capable of centering text without major effort. That includes Markdown which can only center text through use of inserted html code.)

But back to the immediate situation: getting Markdown transformed to PDF.

The first step is made possible through use of Anton Antonov's Markdown::Grammar:ver<0.4.0> module. The code for that is shown here:

use Markdown::Grammar:ver<0.4.0>;
my $markdown-doc = "poem.md";
my $pod-doc      = "poem.rakudoc";
$pod-doc = from-markdown $markdown-doc, :to("pod");

The second step is Rakupod to PDF, but that step can be further broken down into two major paths:

  • Transform Rakupod to PDF directly

  • Transform Rakupod to PostScript

    • Transform PostScript to PDF (ps2pdf)

Santa's IT group decided, given the current state of Raku modules, one of the easiest ways is to use David Warring's module Pod::Lite and his very new module Pod::To:PDF::Lite for the direct transformation. That module has encapsulated the huge, low-level collection of PDF utility routines into an easier-to-use interface to get typesetting quality output. (Note David is actively improving the module, so keep an eye out for updates.)

But that route has a bump in the road: PDF::Lite requires the user to provide the $=pod object (technically it is the root node of a Raku tree-like sructure). That is easy if you're calling it inside a Raku program, but not if you're trying to access it from another program or module. Thus comes a new Raku module to the rescue. The clever algorithm that makes that possible is due to the Raku expert Vadim Belman (AKA @vrurg), and it has been extracted for easy use into a new module RakupodObject.

So, using those three modules, we get the following code:

use Pod::Lite;
use Pod::To::PDF::Lite;
use RakupodObject;
my $pod-object = rakupod2object $pod-doc;
# pod2pdf $pod-object # args ...

IT used the output PDF documents in its PDF::Lite wrapper program, combine-pdfs.raku, and added some convenience input options. Raku is used World-wide so they allowed for various paper sizes and provide settings for US Letter and A4. Finally, they provided some other capabilities by customizing the PDF::Lite object after the base document was created so it can:

  • Combine multiple documents into a single one

  • Provide a cover and a title for the unified document

  • Provide a cover and a title for each of the child articles

See program combine-pdgs.raku for details, but the flow inside looks something like this:

use PDF::Lite;
use PDF::Font::Loader;

my @pdfs = <list of pdf files to combine>;
# create a new pdf to hold the entire collection
my $pdf = PDF::Lite.new;
$pdf.media-box = 'Letter';
my $centerx = 4.25*72; # horizontal center of the page
# the cover
my PDF::Lite::Page $page = $pdf.add-page;
# add the cover title info...
for @pdfs -> $pdfdoc {
    # the cover
    $page = $pdf.add-page;
    # add the part cover title info...

    my pdf2 = PDF::Lite.open: $pdfdoc;
    my $npages = $pdf2.page-count;
    1..$npages -> $page-num {
        $pdf.add-page: $pdf2.page($page-num);
    }
}
# ...

The end product is usable, but it would take a lot of tweaking to get it into better form for more formal PDF projects. However, it is very useful for a quick solution (see Footnote 3).

Modifying the source Markdown products for Santa's pet project needed something else: combine the pieces manually into one The single document is named An-Apache-Cro-Web-Server-Recipe.md and minor sructural changes were made to make internal rather than external references to the two parts. When modified with md2pod.raku and pod2pdf.raku it produces An-Apache-Cro-Web-Server-Recipe.pdf.

Summary

Thanks to Raku developers we finally have a direct and robust way to convert complex documents written in Rakupod into the universal PDF format. Using the semantics of Rakupod to affect the conversion to PDF will improve Pod::To::PDF::Lite output to suit authors. Such configuration details would have to be carefully designed and implemented as some kind of standard, perhaps as an RFC. For a simple example of part of such a standard, here is a Rakupod block that could be used to define defaults for PDF output:

=begin pdf-config
=config :head1 :font<Times-RomanBold> :size<16> :align<center>:
=config :head2 :font<Times-RomanBold> :size<14> :align<center>:
=end pdf-config

But that project is for another day—Santa's archivist Elves are happy for now!

The final product of a real-world test of the Markdown-to-PDF work flow is a present from Santa to all the Raku-using folks around the world: a PDF version of a combined version of the two-part article from Tony O'Dell (AKA @tony-o) for creating an Apache website with the Raku Cro libraries of Jonathan Worthington (AKA @jnthn)!

Santa's 2022 PDF present to you: 🎀 [An-Apache-Cro-Web-Server-Recipe.pdf] 🎀

(See Tony's original posts at Part1 and Part2.) See the final PDF document at An Apache/Cro Web Server.

Santa's Epilogue

Don't forget the "reason for the season:" ✝

As I always end these jottings, in the words of Charles Dickens' Tiny Tim, "may God bless Us, Every one!" [1]

Footnotes

  1. A Christmas Carol, a short story by Charles Dickens (1812-1870), a well-known and popular Victorian author whose many works include The Pickwick Papers, Oliver Twist, David Copperfield, Bleak House, Great Expectations, and A Tale of Two Cities.

  2. Code used in this article is available at raku-advent-extras

  3. See the author's module-in-work PDF::Combiner. File an issue if would like to use it and what features you would like to see added.