PDFConv - A PDF converter for the Apple Newton

This Java program is intended to convert a PDF file (for which currently no viewer exists on the Apple Newton) into a series of images (currently: JPEG, PICT or BMP), together with a HTML file or a NewtonBook source file containing these images. The result then can be viewed with NewtScape or as a NewtonBook.
I have already played with text extraction, but this is unreliable. The reason for this is very simple:
In a PDF file, there is no text stored. Instead, for each letter the position (and direction) is given. So for extracting text, one would need to take all letters on a single page, sort them into lines and try to find out which letter form a word (and where the spaces are). And because of the different letter sizes, and different typesetting possibilities, a text exctraxrion program can not always guess the text correct (especially if the text is in multiple columns).
The library used for creating the images (JPedal) also supports text extraction. It works good at sorting the letters, but it does not determine the spaces (in some test cases, none of them are detected). It also had problems with umlauts, e.g. extracting them as 'a"'.

Used Libraries

Requirements

I have tested with JDK 1.4.1 and 1.3.1 under Windows NT, It should work also on Macintosh platforms, but I don't know if the SUN libraries for writing JPEG are included on this platform. The GUI requires JDK 1.4 or later (But the commandline version also runs JDK 1.3.1).
All required libraries are included in the download.
If you wan to create Newton Books, you will need either the NTK (which gives some troubles on the Mac) or BookMaker (which is the preferred way). After you have run PDFConv, you will then need to run one of these programs to generate the Book.

Documentation

The full documentation is included in the downloads (see below). You can also download it as PDF.

License

GPL
Some of the used libraries are distributed under a different license. For downloading their source, see the links above.

Installers

There are installers available for these platforms (created with Install Anywhere Now! 5.5):
Windows
Macintosh
Generic Java

A Mac direct executable can be found here for GUI and command line.

Links

Freshmeat project page
Paperbook homepage

Change History

version 0.5.0 ( bin / src )

  • added a mac installer
  • added GUI
    • allows everything possible via the command line
    • documentation is missing in the moment, but should be easy to understand
    • uses 'Plastic3D' theme from 'JGoodies Looks' as default. This can be changed via command line options: '-win', '-metal', '-mac', '-plastic', '-plastic', '-plasticxp', '-none'
  • bugfix: generated text files are now closed immediately after writing
  • changed logging to use a log handler (to allow message display in the GUI)

version 0.2.2 ( bin / src )

  • added a mac binary distribution
  • added option '-writeBMP2' to write B&W BMP images, instead of gray ones
  • added option '-pages' to specify which pages should be converted
  • added switch '-pdfscale' to specifiy the scaling of the PDf when rendering the internal image
  • config file 'pdfconv.cfg' is now read at startup and used
  • when verbose output is written, the time for each step is printed
  • made a build of jpedal 1.89 for JDK 1.3 (this should make the rendering faster)
  • added the missing "jai_*" libraries into the archive files

version 0.2.1 ( bin / src )

  • writing BMPs did not work from pdfconv.exe, as jiu.jar was missing from pdfconv.lax
  • BMPs are no longer written as B&W, but are converted to 16 colors (4 bits), and saved as 8bit files. CAUTION: it is not known whether the generated books (by NTK) are correct.
  • the ISBN is truncated to 16 characters (and a new template variable has been introduced)
  • Java Advanced Imaging V1.1.1.01 is now part of the package (required by jpedal)
  • in the jpedal.jar, all JAI classes are removed
  • build.xml is now part of the source and full package

version 0.2.0 ( bin / src )

  • documentation now also as PDF
  • should now exit cleanly under OS X (added System.exit())
  • jcmdline has been patched to work under JDK 1.3.x
  • added LaunchAnywhere for Win - you can just use pdfconv.exe
  • command line params are now freed from new lines (and spaces as well)
  • changed command line handling (again). See userguide.
    • basename is now optional (and an option). If not given, it is calculated from the PDF file name
    • multiple PDf files can be given to the command line. This should enable drag-to-the-startscript
    • if no basename is given, each PDf file gets its own output files
    • if basename is given, all PDF files gets combined into single target
    • it can now be specified which files are generated
  • number in output files are now padded with zeroes (3 digits)
  • if no images are created, the PDF pages are not processed
  • now using a template engine for writing output files (htmltemplate 0.92, compatible with the Perl module HTML:Template)
  • BMP images are now saved in B&W, otherwise the NTK creates books with black pages :(
  • added the 2 command line switches "-verbose" and "-debug"
  • "writeBook" now writes a BookMaker source file
  • added command line switch "writeNTK" to write NTK book source file
  • internal refactoring
    • added Converter class for conversion, This class can be called from a GUI. PDFConv itself does now only command line parsing
    • Converter can use multiple document and image backends. So it is now possible to create HTML and a book simultaneously
    • introduced packages for cleaner structure
    • introduced config objects for holding and passing parameters
    • image scaling is now its own class
  • now using jPedal 1.88
  • now using HTMLTemplate 0.92 as template engine
  • now using Java Imaging Util 0.10.0 for image conversion and BMP saving

version 0.1.4 ( bin / src )

  • added proper command line parsing
    • no more " are required around the parameters
    • proper help when no or wrong parameters are given
    • POSIX compatible command line handling
    • the command line format has been changed a little bit (see readme)
  • as the JIS package is now distributed under the LGPL, the new library is now included. The sources can be found on Faidons page.
  • updated documentation
  • rename "readme_also" to "userguide", and reformatted it

version 0.1.3 ( bin / src )

  • this release has been provided by Eric Schneck (eschneck AT mindspring DOT com). Thanks!
  • added -pict to help text on command line
  • added BookMaker output when writing PICT files
  • provided additional readme
  • generated HTML now contains proper TITLE tag
  • refactoring

version 0.1.2 ( bin / src )

  • PICT generation has been changed from JIMI to JIS, which works. Additionally, the download size has been reduced.

version 0.1.1 ( bin / src )

  • added PICT generation (with the Jimi library), this does currently not work (the generated PICT file seems to be invalid)

version 0.1.0 ( bin / src )

  • added command line switches
    • set JPEG quality
    • scale images
    • target directory
    • base name for output files
  • writes HTML file containing all files
  • internal refactoring

version 0.0.1 ( bin / src )

  • initial version