Pinboard to EagleFiler: Automatic Import

Hi,

I’m trying to write a bit of code to automatically import Pinboard bookmarks tagged ‘work’ to EagleFiler. This seems to be working all right:

import pinboard
import datetime
import subprocess

work = pb.posts.all(tag=["work"])

for bm in work:
    tagnames = ', '.join(map(lambda x: '"'+ str(x) + '"', bm.tags))
    script = '''
    tell application "EagleFiler"
        try -- ignore duplicates
            set {{_record}} to import URLs "{url}" Web page format Web archive format
            set the title of the _record to "{title}"
            set the basename of the _record to "{title}"
            set the note text of the _record to "{description}"
            set the assigned tag names of the _record to {{{tags}}}
            set the creation date of the _record to date "{date}"
        end try
    end tell
    '''

    proc = subprocess.Popen(['osascript', '-'],
                        stdin=subprocess.PIPE,
                        stdout=subprocess.PIPE,
                        universal_newlines=True)
    stdout_output = proc.communicate(script.format(url=bm.url,
     title=bm.description, description=bm.extended, tags=tagnames, date=bm.time.strftime('%A, %B %-m, %Y at %H:%M:%S %p')))

The idea is that I would run this code from time to time to import my work bookmarks. One issue with the above is that I’d prefer to use Web archives for the import, but I noticed that after repeated runs EagleFiler does not detect duplicates. This is not a problem if I choose bookmark as the import format.

Is this the expected behavior of EagleFiler?

Thanks.

Yes, EagleFiler determines duplicates by file content. So a bookmark with the same title and URL will be considered equal (since that’s what the file stores). Web archives will always be considered unique (since they contain HTTP request data that is different each time).

In the future, I will probably make it an option to determine duplicates by URL. For now, perhaps you could adapt your script to check the source URLs of the existing records before importing.

Hi @Michael_Tsai, checking for the source URL sounds like a good idea!

Hi @Michael_Tsai, is there a way to get the source URLs for all the records using AppleScript or by other means?

Yes, for example you could do something like this:

tell application "EagleFiler"
    tell current library document
        set _urls to source URL of every library record
    end tell
end tell

Thanks @Michael_Tsai, the code below seems to be working. I’m not an expert on either python nor AppleScript so corrections are welcome:


import pinboard
import datetime
import subprocess

pb = pinboard.Pinboard('MY PINBOARD API')

bookmarks = pb.posts.all(tag=["work"])

script = '''
    tell application "EagleFiler"
        tell current library document
            set _urls to source URL of every library record
        end tell
        return _urls
    end tell
    '''

proc = subprocess.Popen(['osascript', '-'],
                        stdin=subprocess.PIPE,
                        stdout=subprocess.PIPE,
                        universal_newlines=True)
    
urls = proc.communicate(script.format())
urls = urls[0].split(', ')

for bm in bookmarks:
    if bm.url not in urls:
        tagnames = ', '.join(map(lambda x: '"'+ str(x) + '"', bm.tags))
        script = '''
        tell application "EagleFiler"
            try -- ignore duplicates
                set {{_record}} to import URLs "{url}" Web page format Web archive format
                set the title of the _record to "{title}"
                set the basename of the _record to "{title}"
                set the note text of the _record to "{description}"
                set the assigned tag names of the _record to {{{tags}}}
                set the creation date of the _record to date "{date}"
            end try
        end tell
        '''

        proc = subprocess.Popen(['osascript', '-'],
                            stdin=subprocess.PIPE,
                            stdout=subprocess.PIPE,
                            universal_newlines=True)
        stdout_output = proc.communicate(script.format(url=bm.url,
         title=bm.description, description=bm.extended, tags=tagnames,
          date=bm.time.strftime('%A, %B %-m, %Y at %H:%M:%S %p')))

I don’t think you should rely on parsing the script output using a comma. I would suggest something like this:

tell application "EagleFiler"
    tell current library document
        set _urls to source URL of every library record
    end tell
    set AppleScript's text item delimiters to return
    return _urls as Unicode text
end tell

and then splitting the lines.

The Python part could be slow if there are a lot of URLs. I would do something like:

urls = Set(urls)

so that lookups are faster.

Lastly, I think you might run into trouble if any of the parameters that you are interpolating into the generated script contain " or \. It might be better to pass them into osascript as arguments, and then they would received by the handler:

on run argv

Hi @Michael_Tsai thanks for the feedback, very useful. I didn’t know you could change how AppleScript returns a list, and the Python advice is well taken.

It affects the way AppleScript converts a list to/from text. Before, you were getting an implicit conversion from the list by osascript with the default delimiter (or whatever it was last set to). The modified version makes both explicit.

Hi @Michael_Tsai

Ok I gather I have to follow some better practices as discussed here: bash - osascript how to pass in a variable - Stack Overflow

I’ll keep iterating. Thanks!