Try JPEG compression with various levels of compression to optimize document size against readability. You can make the document size more manageable by compressing the TIFF file. WMMV.Īssuming that the original book you scanned is in paper format, then TIFF is about the best you can do in any event. ![]() There are some Open Source OCR programs available, depending on the OS you are using. If the book is not primarily text, for example, a book with many screen shots or other images that are important, then OCR will not help you. If the book has an unusual non-western font with ligatures then the chance of getting goot OCR output are close to zero. The language and particularly the writing system and font also affect OCR accuracy. The format of the pages (header, footers, section headings, etc) also affect the ability of OCR to accurately guess the characters. In order to get just the character data from a TIFF file you would have to use an OCR (Optical Chararacter Recognition) program. TIFF (Tagged Image File Format) is a pixel-oriented format that is intended for images. The product also provides the ability to split and merge TIFF files (from single page to multi-page and vice versa) and creation of image-only PDFs from TIFFs. Here is an example for a 5 page sample file, with some text in it - the actual ratio between the compression algorithms depends of course on the content of your file: 24bit JPEG: 1294kBĪfter that conversion you still can run a OCR tool (like Adobe Acrobat Pro) to make the text search- and copyable. TIFF Junction includes a sophisticated OCR engine enabling conversion of TIFF images to searchable PDFs. If your TIFF is black and white (please make sure, it is really saved with only 1bit colordepth), I usually use -compress Fax. convert -compress LZW multi-page.tif book.pdf If you have a 24bit colored TIF, you can use LZW (lossless) or JPEG (lossy), e.g. However, you should employ the -compress Parameter. Simply use that command to convert your multi-page TIFF file convert multi-page.tif book.pdf ![]() When you install, be sure to select the option to include the "convert" legacy utility. If you are fine with just converting the file format (without making the text searchable (via OCR as pointed out in the answer by Jonathan Ben-Avraham), this will be a perfect task for convert from the ImageMagick package (available for various platforms: Win, Linux, Mac).
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |