Traherne Digital Collator: visual collation and the printed page
Spotting small differences between seemingly identical images is hard. It's even harder when you don't know if they are there.
This blogpost follows Richard Bellis’ post on how to use Juxta Commons to compare multiple texts. As Richard's blogpost revealed, tools such as Juxta are designed to perform textual collation using transcriptions of books or manuscripts. By contrast, this post focuses on visual collation using a newly developed, free to download digital tool, Traherne. This software can assist in the comparison of digital images of different copies of the original printed page.
The nature of the book
Elizabeth Eisenstein's monumental history The Printing Press as an Agent of Change enlarged upon Wiliam Ivins' famous characterisation of print as the means of creating an “exactly repeatable pictorial statement”. Scholars could, regardless of distance, gather round the same passages of a text, share concepts in pictorial form, or references while trusting that – if all had access to the same edition – an imagined community could all be – literally – on the same page.
Like most generalisations about the impact of print, this is only true up to a point. Bibliographers, art historians, textual editors, and printers themselves have always begged to differ. Differences within apparently repeated statements can readily be spotted between copies of one book which acted as a major stimulus to the practice of visual collation in the twentieth century – Shakespeare's first folio Works (1623). A blogpost by Sarah Werner for the Folger Shakespeare Library gives a tiny example, in a stage direction within the play Titus Andronicus:
Variations such as these (an inverted ‘i’ in ‘Lucius’) are common in early letterpress printing, and can take the form of substantially altered texts. Even small errors are inevitable, as no printer would have owned enough type to print such a thing as a 'proof copy' of such a long text as the Shakespeare Folio. In a busy printing house, sections of books were proofed and corrected during printing, often in parallel with the composition of the next section. Once the printing of a section was complete, its type would be made available for other settings. Errors might only be spotted once multiple copies of a sheet had been printed and, due to the cost of paper, an error-strewn sheet might not be discarded even if the errors were glaring. Once the heap of printed sheets was cut, folded and bound in the right order, individual copies of the resulting book might include a variable proportion of sheets in various states of correction.
Steven Escar Smith (whose two articles have been invaluable sources for this blogpost) concludes that 'the printed matter in the last book sold could, and usually did, differ substantially from that of the first, as it also could and quite often did from nearly every other copy in the printing.' Editors of early-modern works who are mindful of such differences and who wish to present the best possible version of a printed text are therefore obliged to see as many surviving copies of the book as possible. They might then choose a version (or a composite of several versions) which best fits a hypothesised manuscript copy text; an author's "final" intentions; or any one of a number of editorial rationales. Bibliographers, by contrast, may study variation not so much in order to improve a text as to investigate how it came to be.
But whether carried out for editorial or bibliographical interest, collation is a laborious process. Twentieth century scholars toured the world to view the same book over and over again. Shakespeare editors were uniquely privileged: thanks to the astonishing collection efforts of Henry and Emily Folger, they were able to directly compare up to 82 complete copies of the First Folio under one roof. It was while working at the Folger Library that Charlton Hinman, based at the University of Virginia and seeking to bring order to the text of Shakespeare, mechanised the art of optical collation.
Mechanical optical collation
Optical collation addresses the difficulty of spotting minute differences between pages. Visual comparison is not easy, even when two copies of a book are immediately adjacent.
What Randall McLeod has called the 'Wimbledon Method’ – in which the collator's focus switches between one of two books while keeping in mind the exact impression of the other – is exhausting, and unreliable. It's much easier if the collator instead keeps focus while the books move, or appear to. Hinman, anecdotally drawing on his knowledge of the analysis of aerial intelligence photographs during World War Two, conceived and built a large piece of machinery equipped with powerful strobe lights and an angled viewfinder in which the viewer could alternately see two copies of books held open on adjustable cradles.
According to Escar Smith over fifty examples of the imposing Hinman Collator were ordered between 1947 and 1979, mainly by large research libraries in the USA and UK, although one was purchased by the CIA and several were acquired by pharmaceutical companies for the purposes of proofing medicine labels. As the long post-war expansion of the research university ebbed, cheaper and more portable systems emerged. These typically work by means of stereoscopy, depending on the human capacity to fuse two streams of visual data – one from each eye – into a single image in the brain. Devices such as the McLeod Portable Collator implement a system of mirrors and blinds through which each eye is trained on a single book. The user's vision does the rest: as the brain is unable to fuse the two variant states of the book into a single image, the variants in the page are glimpsed alternately.
An interesting characteristic of the history of optical collators is their increasing simplicity. Perhaps simplest of all was the 'Barber Method', named after Giles Barber, librarian at the Taylorian Institute in Oxford, which uses a standard photocopier and transparent film: the user simply copies a page from each book and overlays the resulting transparencies on top of each other. Preceded by William Neidig's work with composite photographs of Shakespeare editions in quarto, such methods are complicated by the necessity to work with derivative photographs, the imagery upon which can vary significantly according to resolution, focus, camera angle and many other factors. The curvature of the original page may also differ if one book has been bound more tightly than the other.
It is these problems that inspired the Traherne Digital Collator, which takes its name from Thomas Traherne (c.1637-1674), English poet and writer, whose complete works are being published by Oxford University Press under the editorship of a team of Traherne scholars lead by Dr. Julia Smith. Although few of Traherne's works were printed in or near his lifetime the edition resolved to collate as many copies as possible of the early editions. After reviewing the available optical collation machines, the editors turned to Oxford's Visual Geometry Group, who developed the Traherne tool as free and open-source software, drawing in part on their earlier work on the Bodleian Ballads website's ImageMatch visual search software.
How to use Traherne
Following download and installation on a user's computer (PC, Mac or Linux), Traherne's user interface runs entirely in a web browser. To get started, the user first imports at least one pair of images - a Base and a Comp. Users can import several hundred pairs if they wish, corresponding to two copies of an entire book.
The user must then select a region of the Base image and press Compare.
At this point, users can select 'Curved book pages' under the 'Compare Type' menu, and Traherne will attempt to compensate for differences in curvature.
Traherne uses a computer vision algorithm to processes the two images, extracting geometrically corresponding features (or so-called 'visual words') that are invariant regardless of differences of scale, rotation or skew.
After Traherne has registered the two selected regions of the page, users can visualise the two together, using several modes.
Base (full image) and Base (cropped) simply displays the first of each pair of images.
The Overlap mode displays both images blended together: it is particularly useful in combination with the Zoom tool.
Traherne can flip between each image using the Toggle function
In the Diff view, the Base image is in blue; the Comp in Red (again, using Zoom)
While it can be seen that software can help the editor to find variants in texts, the task of conjecturing how those variants came to be, and whether they might matter to a potential reader of a new edition, is for the editor to decide.
Although Traherne was designed to assist editors of letterpress texts, it can also do useful work with printed images.
In the case of the above engraved portrait of Shakespeare, also taken from the Bodleian and Boston Public Library First Folios, the engraver Martin Droueshout has added shading and detail to the portrait after the commencement of printing. Traherne can sometimes also register two printed images that are very close copies of each other – such as woodblocks, which are often copied. A web-based branch of Traherne, ImageComparator, has recently been designed to address the needs of those comparing a broader range of images than of print.
Traherne is under continuous development, and welcomes bug reports and feature requests.
Eisenstein, Elizabeth L. The printing press as an agent of change. Cambridge University Press, 1980.
William Ivins, Prints and Visual Communication, (MIT Press, 1969)
Smith, Steven Escar. "" The Eternal Verities Verified": Charlton Hinman and the Roots of Mechanical Collation." Studies in Bibliography 53 (2000): 129-161.
Smith, Steven Escar. "" Armadillos of Invention": A Census of Mechanical Collators." Studies in Bibliography 55.1 (2002): 133-170.
Giles Bergel is a book historian and a digital humanist, based in the Department of Engineering Science at the University of Oxford. His interests include bibliography, typography and book design; the histories of copyright, the Stationers’ Company and the British book trades; text encoding and critical editing; and digital librarianship.