Pinboard to EagleFiler: Automatic Import

Hi,

I’m trying to write a bit of code to automatically import Pinboard bookmarks tagged ‘work’ to EagleFiler. This seems to be working all right:

import pinboard
import datetime
import subprocess

work = pb.posts.all(tag=["work"])

for bm in work:
    tagnames = ', '.join(map(lambda x: '"'+ str(x) + '"', bm.tags))
    script = '''
    tell application "EagleFiler"
        try -- ignore duplicates
            set {{_record}} to import URLs "{url}" Web page format Web archive format
            set the title of the _record to "{title}"
            set the basename of the _record to "{title}"
            set the note text of the _record to "{description}"
            set the assigned tag names of the _record to {{{tags}}}
            set the creation date of the _record to date "{date}"
        end try
    end tell
    '''

    proc = subprocess.Popen(['osascript', '-'],
                        stdin=subprocess.PIPE,
                        stdout=subprocess.PIPE,
                        universal_newlines=True)
    stdout_output = proc.communicate(script.format(url=bm.url,
     title=bm.description, description=bm.extended, tags=tagnames, date=bm.time.strftime('%A, %B %-m, %Y at %H:%M:%S %p')))

The idea is that I would run this code from time to time to import my work bookmarks. One issue with the above is that I’d prefer to use Web archives for the import, but I noticed that after repeated runs EagleFiler does not detect duplicates. This is not a problem if I choose bookmark as the import format.

Is this the expected behavior of EagleFiler?

Thanks.

Yes, EagleFiler determines duplicates by file content. So a bookmark with the same title and URL will be considered equal (since that’s what the file stores). Web archives will always be considered unique (since they contain HTTP request data that is different each time).

In the future, I will probably make it an option to determine duplicates by URL. For now, perhaps you could adapt your script to check the source URLs of the existing records before importing.

Hi @Michael_Tsai, checking for the source URL sounds like a good idea!

Hi @Michael_Tsai, is there a way to get the source URLs for all the records using AppleScript or by other means?

Yes, for example you could do something like this:

tell application "EagleFiler"
    tell current library document
        set _urls to source URL of every library record
    end tell
end tell

Thanks @Michael_Tsai, the code below seems to be working. I’m not an expert on either python nor AppleScript so corrections are welcome:


import pinboard
import datetime
import subprocess

pb = pinboard.Pinboard('MY PINBOARD API')

bookmarks = pb.posts.all(tag=["work"])

script = '''
    tell application "EagleFiler"
        tell current library document
            set _urls to source URL of every library record
        end tell
        return _urls
    end tell
    '''

proc = subprocess.Popen(['osascript', '-'],
                        stdin=subprocess.PIPE,
                        stdout=subprocess.PIPE,
                        universal_newlines=True)
    
urls = proc.communicate(script.format())
urls = urls[0].split(', ')

for bm in bookmarks:
    if bm.url not in urls:
        tagnames = ', '.join(map(lambda x: '"'+ str(x) + '"', bm.tags))
        script = '''
        tell application "EagleFiler"
            try -- ignore duplicates
                set {{_record}} to import URLs "{url}" Web page format Web archive format
                set the title of the _record to "{title}"
                set the basename of the _record to "{title}"
                set the note text of the _record to "{description}"
                set the assigned tag names of the _record to {{{tags}}}
                set the creation date of the _record to date "{date}"
            end try
        end tell
        '''

        proc = subprocess.Popen(['osascript', '-'],
                            stdin=subprocess.PIPE,
                            stdout=subprocess.PIPE,
                            universal_newlines=True)
        stdout_output = proc.communicate(script.format(url=bm.url,
         title=bm.description, description=bm.extended, tags=tagnames,
          date=bm.time.strftime('%A, %B %-m, %Y at %H:%M:%S %p')))

I don’t think you should rely on parsing the script output using a comma. I would suggest something like this:

tell application "EagleFiler"
    tell current library document
        set _urls to source URL of every library record
    end tell
    set AppleScript's text item delimiters to return
    return _urls as Unicode text
end tell

and then splitting the lines.

The Python part could be slow if there are a lot of URLs. I would do something like:

urls = Set(urls)

so that lookups are faster.

Lastly, I think you might run into trouble if any of the parameters that you are interpolating into the generated script contain " or \. It might be better to pass them into osascript as arguments, and then they would received by the handler:

on run argv

Hi @Michael_Tsai thanks for the feedback, very useful. I didn’t know you could change how AppleScript returns a list, and the Python advice is well taken.

It affects the way AppleScript converts a list to/from text. Before, you were getting an implicit conversion from the list by osascript with the default delimiter (or whatever it was last set to). The modified version makes both explicit.

Hi @Michael_Tsai

Ok I gather I have to follow some better practices as discussed here: bash - osascript how to pass in a variable - Stack Overflow

I’ll keep iterating. Thanks!

I’ve managed to put together a pure AppleScript solution, which I will update on GitHub Gist if I change anything: Import Pinboard to EagleFilter (AppleScript) · GitHub

set pinboardUrl to "https://api.pinboard.in/v1/posts/recent?auth_token=" & pinboardAuth

set sourceXml to do shell script "curl " & pinboardUrl

set theUrls to {}

tell application "System Events"
	set _xmlData to make new XML data with properties {text:sourceXml}
	
	tell XML element "posts" of _xmlData
		repeat with _post in XML elements
			tell _post
				copy {href:value of XML attribute "href", theTags:value of XML attribute "tag"} to end of theUrls
			end tell
		end repeat
	end tell
end tell

tell application "EagleFiler"
	tell current library document
		set _existingUrls to source URL of every library record
		
		repeat with _theUrl in theUrls
			if _existingUrls does not contain href of _theUrl then
				try
					set {_record} to import URLs {href of _theUrl}
					set _newTags to {}
					
					repeat with _theTag in words of theTags of _theUrl
						copy tag _theTag to end of _newTags
					end repeat
					
					set _record's assigned tags to _newTags
				end try
				
			end if
		end repeat
	end tell
end tell
2 Likes

This worked great. Very helpful! Thank you!