Loading Chase.com transactions into Ledger
20 Nov 2016When trying to migrate away from Mint to Ledger, one of the major obstacles is automating the work of entering every 3 dollar coffee shop transaction manually. Ledger does provide some help in this area through the use of ledger xact, yet as of today, we enter 25 transactions per week into ledger and a large chunk of them is from our Chase credit cards. Automation seems to be the easiest way to make this a lot less painful.
Headless Selenium to the rescue.
Automating websites has changed quite a bit ever since lots and lots of Javascript is required for websites to work properly. The most sustainable way I could find to automate interactions with complex websites is Selenium, a framework for programmatically interacting with websites loaded and rendered in an actual browser.
Selenium commonly runs while interacting with an actual, visible browser. This however makes windows pop up on my machine which I don’t care about to I migrated to a headless implementation, in this case PhantomJS early on. This makes development a little trickier because you can’t observe what Selenium does and where it gets stuck but screenshotting mostly alleviates this problem.
The language of choice for this was ruby for no other reason than me being pretty familiar with it and it offering Selenium bindings as well as a reasonable amount of example code. Curiously, Chase seems to require the user-agent to look non-automated so this is the first hurdle to take care of while automating:
I mostly copied the user-agent string from my actual browser and that seemed to be enough. Since PhantomJS is a full browser, it takes care of all cookie handling and other features that might be required to get the Chase website to work.
Login & Frames :-(
What’s not trivial is realizing that the Chase website actually uses frames. Frames require special handling in Selenium so one has to realize that elements can’t be found because there’s a frame around the elements which needs to be entered first.
For Chase, we first navigate to a url displaying the login box:
Now we enter the frame the login inputs are located in and fill them
The code above enters the frame, fills the form, takes a screenshot if in debug mode and then logs us in. The last line exits the frame again as the remainder of the interaction happens outside of frames.
After successfully submitting the login, Chase may ask for a second authentication factor. The code on GitHub handles this case by asking for the OTP token that Chase emails to the address stored in their database; here I’ll obmit it for brevity.
Account discovery and CSV download.
Next we have to wait until the accounts overview is loaded, extract all account numbers and download a CSV file with the latest transactions.
This code so far only, somewhat awkwardly, extracts the account number that we need to be able to fetch the CSV file. We fetch the CSV file by injecting Javascript code into the website which issues a POST request which obtains the CSV. This mimics what would happen if we scripted filling in the CSV download form and hitting the “Download” button. The reason we issue the request from javascript instead of filling the form is that downloading files in Selenium is fairly awkward, especially in PhantomJS. Running Javascript in the browser and returning the result however is simple and led to better code for this case.
This is pretty much it. Now we have a CSV file for all transactions for all accounts accessible through the supplied chase login. This can easily be translated to Ledger, either through its import methods or through Reckon.
I use the following Reckon command to convert Chase CSV to Ledger
reckon --ignore-columns 1,2 -l <Ledger file> --contains-header --unattended --account Liabilities:Chase -f <CSV file>
and this works quite well for me.
How often does it break?
In the last 4 months, I did not have to change any of the code. I did however once have to enable screenshotting to notice that Chase was serving a notice which one had to acknowledge before the accounts page. After manually doing that, the script worked again without modification. I have scripts for 2 others banks and I have not modified any of them. Bank websites don’t seem to change so much, it turns out.
The code for all this is here.