C-Command Software Forum

[OT] Script to convert text to a tab-delimited table

This script is a hack that converts text on the clipboard to a tab-delimited format for use as a table.

I use EF to store PDFs for research. Typically the PDFs contain tables with numerical data in the body of the table with strings in the first row and first column. I often want to copy some or all of these tables for inclusion in another document (Pages, Word, Excel, Numbers etc) as a table.

If you copy tables from PDFs you’ll be aware of the problem of re-creating the mess on the clipboard as a table. Typically, the text copied from the PDF contains carriage returns that break up the rows (‘records’) of the table, but no delimiters for the columns (‘fields’) in the table you copied. This script doesn’t solve the problem of restoring the table format, but it makes a start. You’ll probably have to further format the result in Excel, Word or whatever.

Place the script in your ‘~/Library/Scripts/’ folder and access it through Apple’s script menu.

If you find you’re mostly copying tables with string data in the columns just replace “[0-9]” (without the quotes) in the first line with “[a-zA-Z]” (without the quotes). You must leave the ‘shell script’ on one line.

--
-- Convert text on the clipboard to tab-delimited text suitable for a table in Excel, Word etc --
-- 
-- Peter Gallagher: www.petergallagher.com.au: August 2007
-- 
-- INPUT: The text contents of the clipboard, copied  from e.g. a PDF table.
-- Replaces
-- -- (i) any space followed by a number (possibly modified by a minus sign, tilde or angle bracket) with an underscore, 
-- -- (ii) any number followed by a space with an underscore
-- -- (iii) any other spaces with an @
-- -- (iv) underscores from step (i) with tabs, concatenating multiples (NOTE: the tab char is not visible in applescript), and 
-- -- (v) @ from step (iii) wth spaces, concatenating multiples
-- OUTPUT: Puts the modified text back on the clipboard
--
-- Output will likely need further formatting in e.g. Excel
-- No rights claimed. Your milage may vary.
-- 
set myScript to ("/usr/bin/pbpaste | sed 's/ \\(\\-~\\<\\>]*[0-9]\\)/_\\1/g' | sed 's/\\([0-9]\\) /\\1_/g' | tr ' ' '@'  | tr -s '_' '\	' | tr -s '@' ' ' | pbcopy")
do shell script (myScript)
-- ends