I’ll be at CopyNight this evening to demo the Book Liberator and talk about upcoming plans. Please join us at Vig Bar on Spring and Elizabeth at 7pm for what promises to be a lively discussion!
Book Liberator had a booth at the first New York Maker Faire! It was a successful weekend. We showed the prototype off to hundreds of folks, and people were excited! It’s gratifying to see eyes light up when people realize how simple book scanning can be.
And it wasn’t just the public that likes the Book Liberator. We got two (two!) Editor’s Choice awards. Thanks to Mark Frauenfelder and Gareth Branwyn for the shout outs.
Also, Book Liberator wouldn’t be able to manage without a small traveling circus of support. Thanks to Laura, Christine and Jen for volunteering to staff the table. We were beset by illness and logistical horrors, but everything got done and it was a great day!
We talk a lot about hardware here at BookLiberator, it is what we spend most of our time on after all, but it is time to shine a light on the software behind the scenes that turns our page images into beautifully produced “book” collections. That software comes in two parts, scantailor, written by Joseph Artsimovich and djvubind, written by strider1551 of DIYBookScanner.
Scantailor takes the page images from your camera’s memory card:
Djuvubind takes all of those individual images, stitches them together, and compresses that into a very tiny book in the djvu format. I have 1400 page academic books that are now pleasantly readable 10 MB files thanks to this combination of Scantailor and Djvubind.
All of this happens automatically. For each of those 1400 page books all I had to do was 1) rotate the first two pages, 2) hit “Go” for auto crop, 3) draw a box around the few pictures so that their full resolution would be preserved in the final output, 4) run djvubind.
Very simple, very easy. When djvubind, which is less than two weeks old, gets the last kinks out, it will be possible to use the same 4 steps to get a tiny book full of beautiful page images which also has a layer of OCR embedded for text searching.
For anyone who has been waiting to get into personal book scanning until the software develops, wait no more.
Crossposted with churchkey.org
After many months of design and iterative prototyping and at the cost of a small amount of spilled blood, we are happy to announce that we have a final design for the Book Liberator. Take a look:
This overall design is not much different from our early builds, but it includes many small improvements that make the device operate more smoothly and reliably. You’ll also notice that the quality of the carpentry has improved.
We’re now in the thick of scaling up a manufacturing process. In other words, you will actually be able to buy these soon. Our current target timeframe is to have them up for sale this fall! We haven’t nailed down pricing yet, but we hope to hit $350, with cameras.
Ian and I demoed our design-complete prototype for Forbes, and they did a good writeup on the device. This will help get the word out. Tell your friends, warn your enemies: Book Liberator is coming, and it will scan your books!
Book Liberator cadged some table space at HOPE from our sponsor, Question Copyright. We met lots and lots of awesome hackers, and discovered they all love the Book Liberator. We started a lot of good and useful conversations this weekend about everything from manufacturing to remote shutter trigger to lighting options. We’ll be in touch with many of you to continue those discussions in the coming weeks.
Karl and Ian and I are excited about the interest shown in the project and enjoyed the chance to show off our design-complete prototype.
Thanks are owed to Barry, Clyde, and Gordon for tabling with us, and to Nina Paley for some fast and beautiful work toward a Book Liberator logo!
If you spend some time in the ebook community you inevitably run into Distributed Proofreading, the collaborative proofreading group that supplies Project Gutenberg with high quality text versions of Public Domain books. They are a small community of dedicated editors doing good work. Unfortunately, they are also becoming irrelevant to most of the issues in the field because their multi-layer workflow is simply too slow. When organizations like Google are releasing a million books at once, it is hard to stay relevant when struggling to complete your project’s 20,000 book, even if those books, unlike Google’s, are meticulously verified and formatted. Scale and quality both matter and, if we structure it right, we can rework our communal digitization projects to get both.
Currently, Distributed Proofreaders only releases books after spending weeks or months verifying that the text version matches the original page images. The industrial scanning efforts like Google Books and the Million Books Project generally skip verification entirely and distribute raw text versions with the photographic page images. This is perhaps the greatest key to their large size. Yes, they also paid for large scale scanning but scanning is easy compared to proofreading, and getting getting easier all the time. You can be sure that Google’s library would not be half so large if they had to pay for the kind of quality that Distributed Proofreaders provides. Unfortunately, if the price of this quality is only having thousands rather than millions of books, it is too high to continue paying.
I propose a middle road between the raw image release and the meticulous text one. What if we distributed raw image and unverified text files from day one, but build our distribution network to enable everyone downloading a copy to upload corrections and share those corrections automatically with everyone else who has a copy? If we did that we could gain speed and scale while also building our community of contributers.
Technologically, bittorrent and a rich client like miro would get us most of the way there. We would make each book into a miro channel that people would subscribe to when downloading the book. Once downloaded we would need a book reading view that we could optimize for whatever common reader actions relate to proofreading. Things like spell check and revealing the text around a section to verify academic citations spring immediately to mind. The key is that corrections should come primarily from people’s normal interactions with the books they are interested in, no altruism or active volunteering necessary. Once people have corrected their local copies, the client sends those corrections back to the central server where they can be sent out via rss to everyone subscribed to that book’s channel.
As far as the user is concerned, she simply downloads the books she is interested in with her miro-based library manager and either fixes errors as they bother her, or leaves them alone and watches the text gradually correct itself as other people interested in the same books notice and correct errors. If the errors are really frustrating, she can always fall back to reading the page images and be no worse off than if reading on Google Books or any other large page image-based digital library.
As far as the community is concerned, we get a larger pool of potential contributers because now everyone with a copy can contribute back, and people are able to contribute by sharing spare hard drive space and unused bandwidth rather than having to donate funds to pay for central hosting and distribution. There are plenty of people in the community who have no time or inclination to proofread but would gladly download some book images and leave a torrent running in the background to help share the files more widely.
Making it easier to contribute increases the effectiveness of the project as a whole by helping make sure that all the people who care about a book have the opportunity to put their time into preserving that book. The more people care, the more work gets done. In two years of talking with people about my own book digitization projects, I have grown to have a healthy respect for how much people care about their own books and about preserving them, in whatever form.
In the end, there are only two scalable digitization strategies: teach computers to read, or harness the passion people have for their books for the benefit of us all. A handful of highly organized editors like the Distributed Proofreaders community will always have it’s place, but they cannot handle the scale of this project alone. We should make sure they have some help.
(Crossposted with churchkey.org)
The Book Liberator got a writeup in Good magazine! I sent in hundreds of rambling words about the project, and Theo distilled them into a few pithy quotes. Thanks, Theo, for making me seem clever!
Last week, Ian and Winnie got all heroic with some tools, wood and plexi. The result is a couple sweet prototypes, which we’ll be sending to the Decapod folks so they can hack software to process BookLib images.
In other news, I put a prototype design of the camera mount on thingiverse. Ian’s original washer-and-bolt design was a little janky, and when we get the parameters right on the mount, we should be able to print them quite cheaply.
We’re moving quite quickly towards a shippable kit. The cradle design is stable. We have dimensions for the plexi and the cube. We’re down to exploring two basic design paths (bent plexi vs. two flat sheets). Everything else about the prototype is in the detail stage.
There are photos of the wood hackery around, and I’ll try to post some soon.
One of the biggest problems for people, like Project Gutenberg, who want to digitize and share our culture’s public domain works, is tracking down and confirming that a work is no longer under copyright. Gutenberg is not alone, towards the end of last month I ran into an opinion piece on teleread arguing that Amazon is right to keep away from public domain books for this same reason.
In the United States we have a resource with authoritative records about which works are covered by copyright and which ones are in the public domain, it is called the Library of Congress. Not only does the Library of Congress have authoritative records but, as the largest library in the world, it has physical copies of more works than any other institution. Unfortunately, the Library of Congress has no plans to digitize their collection. For those of us involved with book digitization, this is something of a sore topic.
So it was a great moment for me this morning to read that the Japanese National Diet Library, a close equivalent of our Library of Congress, is digitizing all their out of copyright works. Not only is the Diet digitizing and distributing the of out of copyright works, they are also beginning a process of digitizing the portions of their collection still under copyright in order to preserve those works more easily against physical destruction.
Of course, if preservation is our goal, the true solution is obvious and has been known in this country since its founding:
[T]he lost cannot be recovered; but let us save what remains: not by vaults and locks which fence them from the public eye and use, in consigning them to the waste of time, but by such a multiplication of copies, as shall place them beyond the reach of accident.
(Boyd, ‘These Precious Monuments of…Our History,’ pp.175-6)
Whatever the reason, it is great to see leading institutions take steps to share the public domain with the public.