C-Command Software Forum

Workflow for converting paginated HTML manuals into a single PDF

[Update 1: to convert app “help” files on your computer, see note at end]
[Update 2: included alternate method for reordering]

Various apps provide online manuals as a set of linked html pages. To reduce clicking for such manuals (or other set of linked pages), one can combine the pages into a single pdf.

Here’s one way, using EF with SiteSucker, an Automator workflow, and an AppleScript to reduce clicking and assemble the pages in their intended order:

Initial setup (yes, it’s several steps):

Install SiteSucker.

  • Put the attached “Date Downloaded” workflow in ~/Library/Workflows/Applications/Folder Actions. (This changes the “Date Modified” for all downloaded files to current. This workflow is a modification of this one to work recursively through all downloaded folders.)
  • Create a “SiteSucker downloads” folder in your Downloads folder. <ctl>-click on it and select “Services:Folder Action Setup”. Assign the “Date Downloaded” workflow to it. Close the setup dialog.

In SiteSucker:

  • Click “Settings” and set: 1) Options:HTML processing:Localize, 2) Download Folder:SiteSucker downloads, 3) Limits:Maximum Number of Levels:2, and 4) Advanced:Download Delay:0-5 seconds. Save these settings using the gear at the bottom left of the dialog. (The Levels:2 setting should work for manuals with a fully elaborated table of contents. You can adjust as needed per below. The Delay can be “none” for manuals with filenames that are in proper order when sorted alphabetically.)
  • Set Preferences:Connections for new documents to 1 (or more for manuals with filenames that are in proper order when sorted alphabetically).

Put the attached “All-tabs-one-at-a-time-to-EF” script in a suitable location (e.g., the Safari scripts folder, ~/Library/Scripts/Applications/Safari). (This builds on Michael Tsai’s script here.)

Put the attached “Combine PDF pages” app in a suitable location (e.g., in the Applications folder or in the Dock).

To use (<10 mins):

In SiteSucker:

  • Point SiteSucker to the first page of the manual.
  • Click the Download icon. (This might take some time unless the SiteSucker delay is set to none per discussion above.)
  • After downloads are complete, click “Download Folder”.
  • Navigate to the folder containing the html files. (It might be several levels down.)
  • If needed, adjust the “Limits:Maximum Number of Levels”, delete the downloaded folder, and repeat this step.

In Safari, open a new window.

In Finder, View as List (<cmd>2). Now, we have a choice as to when and how to order the files so that they make reasonable sense when we can assemble the pdf.

**Reordering – Method 1: **If sorting by Name puts the files in reasonable order, use that. Otherwise, try Date Modified. Hopefully one of those is at least close. (You’ll have a chance to fine-tune below.) Then select all the html files (and no others), and type <cmd>O to open them in a Safari window. Wait for all the tabs to load.

Check the tabs (shortcut: <cmd><shift>left or right bracket) to make sure they are in the correct order. (The above process and the Date Downloaded workflow operating on the SiteSucker downloads folder should have taken care of that.) Move tabs as needed.

Use the All-tabs-one-at-a-time-to-EF script to capture the tabs into a “_Batch Import” folder in EF. (That folder will be created if none exists.) Wait for capture to complete.

In EF, select the “_Batch Import” folder and type <cmd>R to display in Finder.

In Finder, View as List (<cmd>2), and sort by Date Modified.
**
[UPDATE]
Reordering – Method 2: **A way to reorder that –might– prove less cumbersome postpones the reordering until after the Batch Import. That is:

Select all the html files (and no others), and type <cmd>O to open them in a Safari window. Wait for all the tabs to load.

Use the All-tabs-one-at-a-time-to-EF script to capture the tabs into a “_Batch Import” folder in EF. (That folder will be created if none exists.) Wait for capture to complete.

In EF, select the “_Batch Import” folder and type <cmd>R to display in Finder.

Prepend filenames with a sequential prefix and then adjust the prefix manually to make the files sort properly in Name order. I use FileBuddy’s “Rename” Action for the prefixing, but there are other ways – see, e.g., this discussion.[/UPDATE]

Drag the files onto the “Combine PDF pages” app. A progress gear appears in the menu bar. The file displays when complete. Save to desired location.

=========

I’ve used the above successfully on the BookMacster manual (with Levels = 2; Method 1 using sort by Date Modified), BibDesk (with Levels = 3; Method 1 using sort by Name).

And, of course, there might be a simpler way and various special situations to address.

humanengr

=========

UPDATE

To convert app “help” files on your computer:

Find the files

  • Many of these exist as “.html” files, but it might take some work to find them. Try <ctl>-clicking the app in the Applications folder. Select “Show Package Contents” and look for html files in Resources:<lang>.lproj.

  • Sometimes the files are buried deeper – e.g., TechToolPro:Contents:Resources has a TechToolPro.help file which you can <ctl>-click in turn to Show Package Contents:Resources:<lang>.lproj which has a TTP4 Help.html file and a “pgs” folder with subfolders containing html files.

Open these files in a Safari window

  • Select the files/folders in the desired order

  • Continue with the instructions above at “Check the tabs …”.

All-tabs-one-at-a-time-to-EF.scpt.zip (38.6 KB)

Combine PDF pages.app.zip (226 KB)

Date Downloaded.workflow.zip (55.4 KB)