Monday, December 6, 2010

Introduction

I'm not much one for books. In fact, I will be the first to admit that I find a visit to the dentists office more enjoyable than sitting around reading a book. However, as a guy with a degree in information systems, I can't help but think about how to preserve the information contained in books that are on the verge of falling to pieces. After all, dead tree format is one of the oldest information systems known to man besides stone etching and cave drawings.

Not only is the information in books appreciable as an information system, but also because it is a piece of our history. It is a snapshot in time to the lives and language spoken by those who originally authored and read these books. Essentially, when old books fall into disrepair and decay we lose a bit of ourselves and where we have come from.

A few years ago, I took a look under my front porch. There I found a treasure trove of artifacts from a bygone era. Most of the stuff was tossed in there like garbage. In fact, to the people that threw it there, it probably was. In the mess of stuff, I found a cache of school books belonging to a third grader named Robert Naugle who lived in my house back in the 1940's according to the inscription in the front of a math textbook. Robert would be in his late 60's/early 70's as of this writing.
The pile of books

Some of these books were in terrible shape and the information in them needed to be preserved for future generations to look back on. Now that you know the why, here is the how.

The Method

The methods I will be using throughout this piece are destructive to the book and are only meant to extract text and images. In essence, the state of the book I will be using is already beyond repair. Only consider this method in cases where the books have been eaten by bookworms, destroyed by mold, pages are extremely brittle or the book has been damaged to a point where you don't want to pay the money to have it re-bound but want to save the content.

There are less destructive methods of preservation out there. For example, wander on over to instructibles and see how to build your own book scanner. I have one of these for instances where I want to clean out my closet and donate the books to charity or, in the case of college textbooks, sell to other students.

Things to consider while doing this

If you are allergic to mold of any kind, consider having someone else do this for you. Mold can be a very dangerous hazard to your health whether you are allergic to it or not. Take every precaution that you feel is necessary to safeguard your health including, but not limited to: rubber gloves and/or some type of breathing filter.

Another thing to consider is hygiene. Be sure to wash your hands frequently. You never know what's living on old books. In this case, books from under my front porch for over half a century, I'm not going to risk getting ill over some nuggets of information.

Beyond staying clean and healthy, being organized also helps. If you think you need a break, take one. You need to be focused and patient in order to have a desirable outcome.

Breaking the Binding

If you're 100% sure that the book you want to do this to is beyond repair, it is very easy to break the binding. There is no turning back at this point.

Grab a razor knife, open the front cover and guide said razor carefully between the front cover and the first page. You should see the cover come off rather easily. (See image) If it doesn't, don't force it. Find what is obstructing the separation and sever it surgically with the razor blade.

Repeat the same steps on the back cover.

Now you should notice that the book looks like a bunch of pamphlets sewn together with thread (See image). At this point you will want to separate the pamphlets from each other by cutting the thread that is between them.

Once the section is separated, remove the thread. It should pull right out. If not, don't force it. Open the section to the middle and you'll see the thread. It should be come out more easily.

Be sure to stack the sections of book in order. Otherwise you could end-up with a mess.

The next part is very risky in that you will be separating the individual pages. If you have access to a precision paper cutter, use it. If not, use either really long scissors or a razor blade. Open the book sections to the halfway point and cut very carefully along the midpoint. You will now have some loose leaf pages.

Tearing may be quicker in some cases, but very risky. If you have to, do it slowly and deliberately. Use a straight edge or some type of fence to keep things straight.

Once again, be sure to stack the pages in order unless you want a mess. Marvel at how thick your book looks now after the compression due to binding is released!(See image)


Digitizing the Pages

In my case, I have access to a very nice document scanner with an automatic feeder. For most pages of my book, it will be just fine to feed them through and capture the text and images in duplex and in batch.

The badly damaged pages require the use of more ginger methods like the use of a flatbed scanner or in the case of extremely delicate pages, a high resolution camera.

Your individual setup and needs will vary. The important thing is to scan an image of each page with as high resolution as possible.

Processing the Pages

Congratulations if you have made it this far. The easy and quick parts are done. Now on to the time-consuming and tedious parts.

Your scanned images are a little out of skew. They are just crooked enough to notice. Also there is a bunch of noise in the background which could make for a PDF the size of your mom (couldn't resist).

To solve those issues, I use Scan Tailor. This application is available for both Windows and Linux and works just the same in both.

I normally load my images into this application and skip directly to the content selection part. Why? The first three options the program does in the background anyway. This will take long enough as it is.

Don't trust this application to do a good job 100% of the time when identifying content. You will have to lay eyes on every single page even if you just gloss over them in the preview pane. When you find a page where it screwed-up in identifying the content, manually drag the selection box borders to where the edge of the content actually is. If you have an especially destroyed book, you will also have many many pages where content isn't correctly identified. Mold counts as content apparently.

After that has finished, skip ahead to the Output option. This will need to be run twice if there are images. For the first round, I usually just let it run set to black and white with 600x600 dpi. For pages with images, I go back and manually tweak the settings depending on whether or not there is color. For the book I'm using for this demonstration (greyscale images), the "mixed" setting is appropriate. Scan Tailor will work the magic from there. Neat little app. Done with Scan Tailor.

PDF Creation

From here, there are two tracks one can take. If you have money and have Adobe PDF Writer, it is trivial. Go to the directory with all the pictures and you will see an "out" folder. Import all the TIFFs into a new PDF document, optimize and you're done.

But since I'm poor and Adobe tends to not care when your operating system borks itself and won't let you reactivate the software you paid for... I'll save that rant for another time... There is an open source way that is a little bit of a pain, but it's free.

This method is Linux only. Make sure you have imagemagick, pdfopt and tiff2pdf installed first.

Navigate to the directory your images are in through the terminal. Type in "convert *.tif one_huge_tiff.tif" without quotes of course. Then "tiff2pdf -z -t "Title of Book" one_huge_tiff.tif -o temp.pdf".

You have a working PDF at this point but it will need optimization. "pdfopt temp.pdf "Book Title.pdf"" to fix that.

Congratulations. You now saved a decaying piece of our collective history from ever rotting again!

Finished Product
 
A page in decent shape. (Text didn't scale well)

Badly damaged page with images
Sample pages from the book in PDF
Reactions:

2 comments:

  1. You will discover quite numerous writing styles, subjects AS WELL AS topics This exist; therefore, It\'s a good challenging employment trying to confirm the college essay. Per contra, When evaluating a good college essay, You will find certain fundamental Requirements The idea these essays need to be able to fulfill. While you\'re grading a good college essay, This is important that you can read Personal points, inclusive regarding grammar, style, content AND structure.essay writing services to reduce stress

    ReplyDelete

Subscribe to RSS Feed Follow me on Twitter!