website/content/blog/archiving-toots.md

3.3 KiB

date draft math medium_enabled medium_post_id tags title
2022-05-20 22:47:48-04:00 false false true c1b813e3dff0
Hugo
Mastodon
Archive
Archiving Toots

In the spirit of syndicating Mastodon toots to my own site, I wrote a Python script that turns toots into Hugo markdown files.

In this post we'll go over:

Mastodon API

Before we can retrieve our toots, we need to know what user id of our account. James Cahill wrote a very clean web tool to grab your user id. For the sake of example, we'll use mine which is 108219415927856966.

To grab the statuses, we then need to access the following URL:

https://SERVER/api/v1/accounts/USERID

For my specific user:

https://fosstodon.org/api/v1/accounts/108219415927856966

By default, this will return 20 statuses in an array. To see how to parse each individual status, check out my post on displaying a single toot.

You can use the limit parameter to set how many statuses you wish to see. The maximum number you can set it to is 40.

In order to see more than the last 40 toots, there is another parameter that we can use called max_id which will specify the maximum toot id to respond with. You can then use this parameter to grab all your toots, by following the following algorithm:

  • Make an initial query to the API
  • Find the smallest toot id in the response
  • While the response is not empty
    • Send a query with the max_id set to the smallest toot id known
    • Update the smallest toot id known

Together with limit and max_id we can grab any specified number of toots. Here's the psuedocode for that:

for _ in range(math.ceil(RETRIEVE_NUM_TOOTS // MAX_TOOTS_PER_QUERY)):
    url = f"{SERVER}/api/v1/accounts/{UID}/statuses"
    url += "?limit=40" if limit_param > 40 else f"?limit={limit_param}"
    url += "&max_id={max_id}" if max_id is not None else ""
    response = query(url)
    if len(resposne) == 0:
        break
    # Process response...

Reformatting Toot

Rather than storing the JSON of the toot verbatim, I do make some changes to it for the following reasons:

  • By default every toot has a lot of account information, this is an issue because if my number of followers update, then I need to update all my toots.
  • Hugo expects certain field names to exist. For example: date.

I delete the following account information from my toot archive:

  • Lock status
  • Bot
  • Discoverable
  • Group
  • Created_at
  • Note
  • Follower count
  • Following count
  • Status count
  • Last status at

I create a date field based on created_at. The URL field in mastodon conflicts with Hugo, so I rename it to syndication.

Creating the Markdown files

From here I can construct the following markdown file based on this template:

---
{Toot JSON}
---
JSON.content

I save each toot in an individual markdown file under content/toots.

Conclusion

My full script is on GitHub. The script will let you know of any toot IDs that are created and/or updated. I then add these toots to Git for version control just like my posts.