June Monthly Meta: Development from the other side

Published May 31, 2022, 6:48 a.m.

Hey there, itsthejoker (dev team and admin team for Grafeas) here to talk about tooling!

In software development, tooling refers to programs or scripts that only serve to maintain other programs or scripts. In our case, we write tooling for the QA and Engagment teams so that they can directly interact with the keeper of data, Blossom, or various spreadsheets that are used to keep track of different things (like the delightfully awesome Treasure Hunts in Discord).

Since all of this happens behind the scenes, do you want a peek behind the curtain? Who am I kidding, of course you do. Let's go!

Automation: the science of fixing the boring parts

When looking at where to ease issues, there are a handful of questions that should be asked:

Where does it go?

Before we even get to "what does it do", we need to first figure out "how do people access the tool". If it's incredibly useful but requires a journey to the nearest rainforest to use, it's not a particularly useful tool. On the other hand, if something is literally in your back pocket but it's about as useful as this fork, you're not going to use it much either. Thankfully, that second issue is much easier to fix.

As for the question itself, because we use Slack as our modchat, the grand majority of our mod tooling hooks directly into our modchat so that we can interact with it while we're thinking about it. I've got some cool screenshots later :)

What does it solve?

This is a hard question. When we look at problems, human nature is to look at solving the immediate problem, not necessarily the problem that causes the immediate problem. Here's a quick example:

"I'm having trouble updating this thing."

Sure, we could make updating the thing easier, but why are we updating the thing?

"We update the thing so it doesn't look broken over there."

That's a more useful problem -- why does the thing look broken? Is that fixable? If we look at the whole process for the thing, how can we make the most useful changes? If we can figure that out, usually we find that the part that needs to be fixed is not the part that currently raises issues -- it's the part that cause the part that currently raises issues.

How is it used?

This is usually an intuitive answer, but it rises from "what does it solve" in the form of "what is the process for using the fix that we're writing?". If using the fix is harder than the original process, guess what people will do? The original broken process. It's worth thinking about how the fix we're working on will be used and to make sure that it's as easy to use (and as easy to understand) as possible.

Sometimes this takes the form of aliases, where different commands can trigger the same functionality, or it might even require rewriting part of a system to make it easier to remember. The technology works with the people, not against the people.

What if it doesn't work?

If the tool breaks, is there an alternative process available? What if we missed something important? I'll be the first to raise my hand and say I'm not perfect (and I've worked on tools that distinctly did not solve the problem -- oops) but we can plan for that by:

not killing old processes until the new ones are proven to work
interviews with other teams to make sure that we're addressing issues that they need us to address
listening to other teams to identify where their "pain points" (processes that they're struggling with) are
keeping track of those pain points and figuring out where we can assist

All of this leads to...

Tooling, modding, and Slack

We'll highlight a few different things in this post because there are some things that I'm really proud of that are worth showing off. We'll separate them into three teams: QA, Engagement, and Dev.

QA: the folks with the banhammer

The most raw data is handled by QA as they deal with new transcriptions, new volunteers, and helping people make sure that their transcriptions as as great as they can possibly be. The entire QA workflow is heavily automated, but let's take a look at three specific parts of the new volunteer flow.

Problem: How do we know when someone joins?

When someone accepts the code of conduct, a ping is sent to Slack with the username of the new person and a link to the thread that they commented in. This lets us know who we need to keep an eye on and welcome, but now that we know that they're here...

Problem: How do we know when there's a transcription that needs to be checked?

All transcriptions have a varying chance of being flagged for QA review, but first transcriptions are special and always get their own ping! Using buttons and links directly in Slack, QA can see what should be checked, who is checking it, and approve it with accountability.

Problem: How do we know when something gets reported?

Reddit helpfully provides the modqueue, a single page where you can see all the reported posts in realtime and approve or deny as needed... but this one actually came as a happy accident! The core functionality for this feature was developed as a part of a different initiative and was an easy win to extend to Reddit, so when a post is reported it will appear in Slack to be actioned. Not every post needs an action applied to it -- for instance, maybe it was reported for being removed, but it's since been reinstated -- so we can approve or remove the submission straight from Slack. This is one of the quality-of-life automations that I'm most fond of!

Engagement: does the party ever really stop?

Engagement handles Discord and all of the special events that we put on through Discord and on Reddit -- it's no surprise that they also need a little automatic help with all the things they've got going on! Unlike QA, most of Engagement's automation comes through different sources, either directly from Google Docs or from Slack itself.

The treasure hunt, beast that it is, requires a fair number of reminders and work to keep running. To help keep everyone on task, Slack workflows trigger depending on how far out the event is to make sure everything is in its place. Different workflows contain different reminders. Shoutout to u/seeroflights for masterminding this process!

A different set of issues arise when it comes time to verify the treasure hunt entries when they arrive -- for that, we leverage Google Docs directly! A new submission triggers a ping that lets us know who submitted it and when so that the entry can be validated quickly. It's always fun when two or more submissions come in right next to each other :)

Development: if the alerts channel didn't ping, that's probably supposed to happen

Since we also maintain the bots themselves, it's a reasonable guess that we on Development also require some tooling to help keep everything moving. There are a lot of small scripts and commands that we use on a regular basis, but I want to highlight the ones that help us the most -- the ability to deploy and update any of the bots at a moment's notice, directly from Slack.

Problem: due to security concerns, how can we make sure that deploying updates isn't locked behind only one or two people with appropriate access directly to the server?

We quickly hit an issue where merging code was accessible to the people who needed it, but actually deploying the code wasn't possible unless someone with the proper access could directly log into the server, and we wanted to make sure that the process of actually gaining access to the server was tightly controlled. This brings a new question: how do we utilize a service account in a locked-down way that is easy to use and doesn't require an entire wiki of documentation just to make sure things go smoothly?

Enter Bubbles, our modchat chatbot. Bubbles originally started as a system for doing more complex reminders before morphing into a truly indispensable tool for our team. Among other things, Bubbles directly controls all of the other bots on the server, including herself. Let's say that we push a new feature to GitHub for Blossom -- something that happens fairly regularly -- in the Before Times, we would have to SSH into the server and run a small laundry list of commands just to get the deployment to finish. Instead, we merge in the new code on GitHub and ask Bubbles to handle the process for us directly in Slack.

The process gets slightly more complicated when Bubbles herself needs to be updated, but thankfully that's not too much of a stretch. Between the commands for deployment, restarting, getting logs from a service, or something else, Bubbles is an unstoppable force on the Dev team. (Did I mention that she can recover from a failed deploy as well?)

Closing thoughts

Automation and general tooling is a core component of any business, and the little things behind the scenes that help glue things together are what keeps us moving in the same direction. There's so much more that I can't cover here, but if you got this far then thanks for reading. It's always a pleasure working on these systems and it's even more amazing to see them in action.

Cheers and welcome to June!