Avner Shanan

Mentorship, Software Engineering, Maintenenance and Care

A Quick Dip Into Podcast Addict's Data

Earlier this year I switched to the Android app Podcast Addict, which has been working well enough for my purposes. This morning I decided to find a new podcast to listen to, and, after a few different searches, found one that looked interesting. I started listening to the latest episode to get a feel for it, and ended up listening to the whole thing.

I went looking in my playback history so I could subscribe to the podcast, but the episode wasn’t there! And of course I couldn’t remember the name of the podcast well enough to find it using the search bar either. Some brief googling turned up a reddit post discussing this very issue. Podcast Addict apparently doesn’t keep track of anything unless you’re subscribed to the podcast already. Not that it makes available to you in the app, anyways. Luckily, my search also turned up a post on extracting podcast addict listening history. Turns out that PA’s backup file is a SQLite database! That’s great, because Datasette makes exploring SQLite databases extremely easy.

After following the instructions in the linked post to get my data, I loaded it into datasette (datasette podcastAddict.db) and started digging around. My PA backup has roughly 17k rows across 34 tables, but luckily I don’t need to look at most of that in order to find what I want.

A screenshot of the Datasette web app's index page, showing the Podcast Addict databse has been loaded with 34 tables.  A sampling of tables are linked below it, including topics, timestamp list, teams, tags, and tag relation.  An ellipsis links to a complete list of the tables.

A quick look at the episode table shows that I can facet it by the boolean seen_status column, cutting out ~90% of the rows, then order by playback_date to put my most recently played episode at the top.

A screenshot of the Datasette web app showing the page for the episodes table.  The number of rows is listed at the top of the page as 2,877.  There are many columns suggested as facets, including podcast id, type, favorite, download id, and new status.  The seen status column has been selected as a facet and there are two values, 0 (which has ~2500 rows), and 1 (which has 320 rows).  A single partial row of data is visible for an episode of Ologies by Alie Ward.

The podcast ID is right there and I can go look that up in the podcast table (or edit the SQL manually and do a JOIN, if that’s the mood I’m in). Success!

Tags: , , , ,