September Monthly Meta: Quality Assurance

Published Sept. 1, 2021, 8:41 a.m.

While many of our statistics reflect the number of transcriptions someone completed, it is much more important that they are well made. Or as we often say: "Quality over quantity".

But how can we make sure that this is actually the case? With such a large volume of transcriptions it would be impossible to review all of them. In this post we look at the measures we take and how the process has been drastically improved during the last few months.

Welcoming New Volunteers

The first thing we do is welcome all new volunteers. A mod from the welcome team will send a friendly message to the new user and give them feedback on their very first transcription.

This will already solve the most pressing issues, like using the Fancy Pants editor instead of Markdown mode. In fact, it was our only method of quality assurance for a long time. But is it enough? Not really. Considering the big number of different templates and post types, it is easy to make mistakes even after the initial feedback.

Transcription Checks

After deploying the new bot system, we gained more options to control the transcription quality. We added a system that randomly submits new transcriptions for review, based on the experience of the volunteer.

Since the inception of the checks on June 5th, we have already reviewed over 2,200 transcriptions. This system has proven to be very effective. For new transcribers, it makes sure that they get familiar with the templates and understand the quirks of Reddit Markdown. But even for experienced transcribers it unveiled some issues, such as using old templates that have been updated recently.

Automatic Formatting Checks

It became clear that a lot of issues are very common and appear in many transcriptions. Unfortunately some of them, like not escaping a Reddit username or accidentally making a heading instead of a separator, are not visible on all clients without checking the Markdown source. Therefore, it was a lot of work for the Welcome and User mods to check for these things and to ask transcribers to fix them again and again.

This created the idea of automated formatting checks. Why review everything manually if the bot can do it for you? The Development team (with contributions by u/--B_L_A_N_K--) added automatic detection for a lot of common issues:

Forgetting the header
Making the header bold instead of italic
Forgetting the separators after the header and before the footer
Using a wrong footer
Accidentally making a heading instead of a separator by forgetting the empty line before it
Accidentally making a heading instead of a hashtag by forgetting the backslash escape
Using a fenced code block instead of four spaces before each line

This helped reduce the workload for the other mod teams and also had another nice side effect: Because the done response is rejected before the formatting has been fixed, the transcriptions with formatting errors won’t enter our database. Unfortunately we can’t check for edits on each transcription, so the more issues we can detect before a done gets accepted, the better.

Of course, not all issues can be detected automatically, so the welcoming and manual reviews are still important.

Treasure Hunt Reviews

Last, but not least, the Engagement team also contributes to the quality assurance. Every treasure hunt entry is manually checked for accuracy, including formatting issues, duplicate submissions and use of an incorrect template. Because we get more than 100 transcriptions submitted each tenday, this is also a considerable chunk of work, especially considering that the Engagement team is one of the smallest mod teams.

And that's it for this monthly meta! We plan to further improve our quality measures in the future. Do you have any ideas? Please let us know!