Want to know exactly what Twitter’s fleet of text-combing, dictionary-parsing bots defines as “mean”? Starting any day now, you’ll have instant access to that data—at least, whenever a stern auto-moderator says you’re not tweeting politely.
On Wednesday, members of Twitter’s product-design team confirmed that a new automatic prompt will begin rolling out for all Twitter users, regardless of platform and device, that activates when a post’s language crosses Twitter’s threshold of “potentially harmful or offensive language.” This follows a number of limited-user tests of the notices beginning in May of last year. Soon, any robo-moderated tweets will be interrupted with a notice asking, “Want to review this before tweeting?”
Earlier tests of this feature, unsurprisingly, had their share of issues. “The algorithms powering the [warning] prompts struggled to capture the nuance in many conversations and often didn’t differentiate between potentially offensive language, sarcasm, and friendly banter,” Twitter’s announcement states. The news post clarifies that Twitter’s systems now account for, among other things, how often two accounts interact with each other—meaning, I’ll likely get a flag for sending curse words and insults to a celebrity I never talk to on Twitter, but I would likely be in the clear sending those same sentences via Twitter to friends or Ars colleagues.
Additionally, Twitter admits that its systems previously needed updates to “account for situations in which language may be reclaimed by underrepresented communities and used in non-harmful ways.” We hope the data points used to make those determinations don’t go so far as to check a Twitter account’s profile photo, especially since troll accounts typically use fake or stolen images. (Twitter has yet to clarify how it makes determinations for these aforementioned “situations.”)
As of press time, Twitter isn’t providing a handy dictionary for users to peruse—or cleverly misspell their favorite insults and curses in order to mask them from Twitter’s auto-moderation tools.
So, two-thirds kept it real, then?
To sell this nag-notice news to users, Twitter pats itself on the back in the form of data, but it’s not entirely convincing.
During the kindness-notice testing phase, Twitter says one-third of users elected to either rephrase their flagged posts or delete them, while anyone who was flagged began posting 11 percent fewer “offensive” posts and replies, as averaged out. (Meaning, some users may have become kinder, while others could have become more resolute in their weaponized speech.) That all sounds like a massive majority of users remaining steadfast in their personal quest to tell it like it is.
Twitter’s weirdest data point is that anyone who received a flag was “less likely to receive offensive and harmful replies back.” It’s unclear what point Twitter is trying to make with that data: why should any onus of politeness land on those who receive nasty tweets?
This follows another nag-notice initiative by Twitter, launched in late 2020, to encourage users to “read” an article linked by another Twitter user before “re-tweeting” it. In other words: if you see a juicy headline and slap the RT button, you could unwittingly share something you may not agree with. Yet this change seems like an undersized bandage to a bigger Twitter problem: how the service incentivizes rampant, timely use of the service in a search for likes and interactions, honesty and civility be damned.
And no nag notice will likely fix Twitter’s struggles with how inauthentic actors and trolls continue to game the system and poison the site’s discourse. The biggest example remains an issue found when clicking through to heavily “liked” and replied posts, usually from high-profile or “verified” accounts. Twitter commonly bumps drive-by posts to the top of these threads’ replies, often from accounts with suspicious activity and lack of organic interactions.
Perhaps Twitter could take the learnings from this nag notice roll-out to heart, particularly about weighting interactions based on a confirmed back-and-forth relationship between accounts. Or they could get rid of all algorithm-driven weighting of posts, especially those that drive non-followed content to a user’s feed, and go back to the better days of purely chronological content—so that we can more easily shrug our shoulders at the BS.