On November 5, 2021 (a Friday of course), we've deployed innocent-looking gem updates. Minor versions of Ruby on Rails, Ruby Sentry client, Ruby Slack client, http libraries, Puma, Devise, OmniAuth Ruby client, Mongoid, and a few test gems.
However, something went very wrong.
We saw odd Stripe errors on Airbrake. Then, on our Stripe account, we saw this:
In less than 1h, we've created 474 new subscriptions for a total of $73,271.36.
We weren't expecting that much business.
Our app for some reason was creating new subscriptions from old accounts that were canceled or disabled a long time ago. We had no idea why but we decided to react proactively as fast as we could, as the issue was still very ongoing.
We immediately rolled back the gem updates to their safe older versions, disabled our Stripe API keys to lock our own application out, and refunded everyone impacted:
After refunding everyone, we manually double-checked the billing state of each account one by one and sent emails to apologize to each customer one by one; all 474 of them.
We still didn't know why.
Most of the updates seemed inoffensive. And after deeper inspection, it was indeed minor gem updates on several small things. We worked on several hypotheses: caching issues, odd race conditions, or some thread safety issues as Puma changed their threading model in that update.
However, we finally found it. One method of our billing logic particularly seemed a code smell (Obligatory disclaimer: I personally wrote this.):
This is part of a feature that seems good on paper: We offer to automatically renew your plan if you run out of searches on our service.
The goal of renew_early_protected
is to avoid the case of someone submitting multiple searches at the exact same time, subsequently getting their subscription renewed then their credit card charged several times.
Notice the or()
method at line 47.
In Mongoid (The MongoDB driver for Ruby) 7.0.8
, or()
meant filter documents that contain any of the argument conditions. In our case, filter users that have a specific ID AND that have either never had a renewing_early_lock
or their renewing_early_lock
is less than 1 hour ago.
Here's what it got computed to for a specific user:
computed_selector_in_mongoid_7_0_8 = {
"_id" => BSON::ObjectId('59af54094-----------64'), "$or" => [{
"renewing_early_lock_at" => {
"$lte" => 2022-01-07 20:14:53.44744 UTC
}
}, {
"renewing_early_lock_at" => nil
}]
}
In Mongoid 7.3.3
, or()
now means filter documents that contain any of the argument conditions OR any of previous method conditions! In our case, filter users that have either a specific id OR that have never had a renewing_early_lock
OR their renewing_early_lock
is less than 1 hour ago.
Here's what it looks like for the same user:
computed_selector_in_mongoid_7_3_3 = {
"$or" => [{
"_id" => BSON::ObjectId('59af54094-----------64')
}, {
"renewing_early_lock_at" => {
"$lte" => 2022-01-07 19:00:09.571034 UTC
}
}, {
"renewing_early_lock_at" => nil
}]
}
Notice that the user ID selector got moved into an optional or()
!
This ultimately means we were charging random customers new subscriptions instead of the actual customer who wanted to renew early:
Which is obviously awful.
We are deeply sorry if you were impacted by this issue. This is not acceptable.
Even if that Mongoid shouldn't have changed existing methods behaviors between minor versions, my implementation was a true code smell as it was unclear what it did. This code should never have been deployed to production. But also and more importantly, the feature itself was a bad idea. Using our API to scrape search engine results shouldn't trigger a renewal of a credit card subscription in the first place. And we'll be removing that feature as soon as possible.
Sorry again to all of our users.