tools v1.0

(With some additions) Lots of authors, brought together by Apprentice Alf.
2009-02-13 20:59:59 +00:00
parent 71d66953d3
commit 93c2ccd2c2
26 changed files with 1923 additions and 129 deletions
--- a/Topaz_Tools/lib/topaz-readme.txt
+++ b/Topaz_Tools/lib/topaz-readme.txt
@@ -0,0 +1,129 @@
+Contributors:
+     cmbtc - removal of drm which made all of this possible
+     clarknova - for all of the svg and glyph generation and many other bug fixes and improvements
+     skindle - for figuing out the general case for the mode loops
+     some updates -  for conversion to xml, basic html
+     DiapDealer - for extensive testing and feedback, and standalone linux/macosx version of cmbtc_dump
+     stewball - for extensive testing and feedback
+
+and many others for posting, feedback and testing
+  
+
+This is experimental and it will probably not work for you but...
+
+ALSO:  Please do not use any of this to steal.  Theft is wrong. 
+       This is meant to allow conversion of Topaz books for other book readers you own
+
+Here are the steps:
+
+1. Unzip the topazscripts.zip file to get the full set of python scripts.
+The files you should have after unzipping are:
+
+cmbtc_dump.py - (author: cmbtc) unencrypts and dumps sections into separate files for Kindle for PC
+cmbtc_dump_nonK4PC.py - (author - DiapDealer) for use with standalone Kindle and ipod/iphone topaz books
+decode_meta.py - converts metadata0000.dat to make it available
+convert2xml.py - converts page*.dat, other*.dat, and glyphs*.dat files to pseudo xml descriptions
+flatxml2html.py - converts a "flattened" xml description to html using the ocrtext
+stylexml2css.py - converts stylesheet "flattened" xml into css (as best it can)
+getpagedim.py - reads page0000.dat to get the book height and width parameters
+genxml.py - main program to convert everything to xml
+genhtml.py - main program to generate "book.html"
+gensvg.py - (author: clarknova) main program to create an xhmtl page with embedded svg graphics
+
+
+Please note, these scripts all import code from each other so please
+keep all of these python scripts together in the same place.
+
+
+
+2. Remove the DRM from the Topaz book and build a directory 
+of its contents as files
+
+All Thanks go to CMBTC who broke the DRM for Topaz - without it nothing else 
+would be possible
+
+If you purchased the book for Kindle For PC, you must do the following:
+
+   cmbtc_dump.py -d -o TARGETDIR [-p pid] YOURTOPAZBOOKNAMEHERE
+
+
+However, if you purchased the book for a standalone Kindle or ipod/iphone 
+and you know your pid (at least the first 8 characters) then you should 
+instead do the following
+
+   cmbtc_dump_nonK4PC.py -d -o TARGETDIR -p 12345678 YOURTOPAZBOOKNAMEHERE
+
+where 12345678 should be replaced by the first 8 characters of your PID
+
+
+This should create a directory called "TARGETDIR" in your current directory.  
+It should have the following files in it:
+
+metadata0000.dat - metadata info
+other0000.dat - information used to create a style sheet
+dict0000.dat - dictionary of words used to build page descriptions
+page - directory filled with page*.dat files
+glyphs - directory filled with glyphs*.dat files
+
+
+3. REQUIRED: Create xhtml page descriptions with embedded svg
+that show the exact representation of each page as an image
+with proper glyphs and positioning.
+
+The step must NOW be done BEFORE attempting conversion to html
+
+   gensvg.py TARGETDIR
+
+When complete, use a web-browser to open the page*.xhtml files
+in TARGETDIR/svg/ to see what the book really looks like.
+
+If you would prefer pure svg pages, then use the -r option
+as follows:
+
+   gensvg.py -r TARGETDIR
+
+
+All thanks go to CLARKNOVA for this program.  This program is 
+needed to actually see the true image of each page and so that
+the next step can properly create images from glyphs for 
+monograms, dropcaps and tables.
+
+
+4. Create "book.html" which can be found in "TARGETDIR" after 
+completion.  
+
+   genhtml.py TARGETDIR
+
+
+***IMPORTANT NOTE***  This html conversion can not fully capture 
+all of the layouts and styles actually used in the book
+and the resulting html will need to be edited by hand to 
+properly set bold and/or italics, handle font size changes,
+and to fix the sometimes horiffic mistakes in the ocrText
+used to create the html.  
+
+If there critical pages that need fixed layout in your book
+you might want to consider forcing these fixed regions to
+become svg images using the command instead
+
+    genhtml.py --fixed-image TARGETDIR
+
+This will convert all fixed regions into svg images at the 
+expense of increased book size, slower loading speed, and 
+a loss of the ability to search for words in those regions
+
+FYI: Sigil is a wonderful, free cross-
+platform program that can be used to edit the html and 
+create an epub if you so desire.
+
+
+5. Optional Step:  Convert the files in "TARGETDIR" to their 
+xml descriptions which can be found in TARGETDIR/xml/ 
+upon completion.
+
+   genxml.py TARGETDIR
+
+
+These conversions are important for allowing future (and better)
+conversions to come later.
+