-
-
Notifications
You must be signed in to change notification settings - Fork 129
custom domains - middleware and verification #2145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
- ACM support - custom domains crud, resolvers, fragments - custom domains form, guidelines - custom domains context - domain verification every 5 minutes via pgboss - domain validation schema - basic custom domains middleware, to be completed - TODOs tracings
- CustomDomain -> Domain - DomainVerification table - CNAME, TXT, SSL verification types - WIP DomainVerification upsert
cd46aeb
to
5e80c3f
Compare
…ange status of a Record from its Attempt, multi-purpose dns verification
…tch as they can fail
- use DomainVerificationStatus enum for domains and records - adapt Territory Form UI to new schema - return 'records' as an object with its types - wip: prepare for attempts and certificate usage for prisma
fix: - fix setDomain mutation transaction - fix schema typedefs enhance: - DNS records guidelines with flex-wrap for longer records cleanup: - add comments to worker - remove console.log on validation values
775a966
to
82a71f5
Compare
So to QA this I need to use real DNS? I suspect we'll want to mock DNS somehow, like we mock ACM etc, going forward to make this trivial to iterate on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a very casual high level first pass to get acquainted. It looks very good.
Thank you for separating it into its own PR. Also thanks to @ekzyis for his first review which I'm sure led to lots of improvements.
I left some questions and nits. I'll make another high level pass tomorrow. Then maybe again depending on when we can test this in a self-contained way.
-- !QUIRK: If someone else takes over the sub, the new guy needs to delete the domain | ||
CREATE OR REPLACE FUNCTION hold_domain_and_delete_certificate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is weird. I guess it's not clear to us whether the transfer is from an alt to another alt or to another person entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can actually just remove the custom domains settings and certificate if the owner is different on re-activation.
And, instead, on transfer, we retain settings
worker/territory.js
Outdated
// make sure to delete the certificate from ACM if the sub is stopped, if we have it. | ||
if (nextStatus === 'STOPPED' && sub.domain?.certificate?.certificateArn) { | ||
await deleteDomainCertificate(sub.domain.certificate.certificateArn) | ||
} | ||
|
||
await models.sub.update({ | ||
include: { user: true }, | ||
where: { | ||
name: subName | ||
}, | ||
data: { | ||
status: nextBillingWithGrace(sub) >= new Date() ? 'GRACE' : 'STOPPED', | ||
status: nextStatus, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm hard to tell if this is the right behavior unless we automatically re-add the cert when it's paid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, a new certificate from ACM is fast and free, what's not really good (for the territory owner) is that they will be subjected again to ACM DNS validation.
Maybe we don't need to remove the certificate at all as long as the domain remains the same but this is also somewhat connected to what you said on the previous comment:
I guess it's not clear to us whether the transfer is from an alt to another alt or to another person entirely.
… HOLD handle territory changes via triggers - on territory stop, HOLD the domain - on territory takeover from another user, delete the domain and its associated records handle ACM certificates via trigger - on domain/domainCertificate deletion, ask ACM to delete the certificate via a pgboss job; removes the need to ask ACM in multiple places clear domains that have been on HOLD for more than 30 days, check every midnight via pgboss schedule use 'domains' profile for worker jobs
-- everytime a domain (relation) or a domainCertificate is deleted, also ask ACM to delete the certificate | ||
CREATE TRIGGER trigger_ask_acm_to_delete_certificate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of jumping around the code to ask ACM to delete a certificate (mostly in shady places), a trigger on domain/domainCertificate deletion offloads this job to our worker via pgboss.
-- SCENARIO: Territory got stopped after grace period | ||
-- HOLD the domain when the sub is stopped | ||
-- this is to prevent the domain from being used by another sub; | ||
-- won't delete anything but it will require a new verification attempt if the sub is resumed | ||
CREATE OR REPLACE FUNCTION hold_domain_on_sub_stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of deleting a domain from a territory, we put it on HOLD so that the original owner can come back without losing any customization or their ACM certificate.
The user will inevitably need to re-verify the domain to make it active again.
edit: well it can be avoided by triggering ACTIVE when a territory goes up again but imo it's not ideal
-- SCENARIO: Territory got taken over by a different user | ||
-- clear the domain when the sub is taken over by a different user; | ||
-- this will delete the domain, its certificates, verification attempts, DNS records and brandings. | ||
-- will also trigger a request to ACM to delete the certificate because the domain is being deleted | ||
CREATE OR REPLACE FUNCTION clear_domain_on_sub_takeover() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we only HOLD the domain on territory expiration, I think it's fair to delete any domain information, with its certificate (thanks to trigger_ask_acm_to_delete_certificate
), if the territory gets resumed by another user instead of the original owner.
-- Clear domains that have been on HOLD for 30 days or more every midnight. | ||
INSERT INTO pgboss.schedule (name, cron, timezone) VALUES ('clearLongHeldDomains', '0 0 * * *', 'America/Chicago') ON CONFLICT DO NOTHING; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been wanting to add this for a long time but didn't know if we actually need it, regardless, now we check every midnight if a domain has been on HOLD for 30 days or more.
There might be a better way to do this so this might get iterated?
The primary reason for this is to delete unused certificates. In the future, it would ideally be coupled with routine checks on domain ownership just to be safer.
edit: still the same but encapsulated in a function for plpgsql
…not valid or we're in production
Description
Part of #1942
Focuses on custom domain insertion, providing a working form and verification process via pgboss.
Uses ACM with LocalStack to provide certificates CRUD simulation.
Screenshots
TBD
Additional Context
#1958 is being split to facilitate review and build greater stuff on top of it.
Doesn't contain Auth Sync or custom branding.
This PR provides:
custom domains
-- context
-- crud, resolvers, fragments
-- form with basic UI/UX
-- ACM certificates support
-- local DNS server via dnsmasq support
middleware
-- custom domain detection via endpoint
-- doesn't apply referrals yet (no rewrites here)
domain verification
-- job every 5 minutes
-- AWS error handling
-- log every verification attempt and its steps
-- normalized schema for Domain, Record, Attempt, Certificate
triggers
-- on territory stop, HOLD the domain
-- on territory takeover, delete the domain
-- on domain/domainCertificate deletion, delete from ACM via pgboss job
workers
-- full domain verification
-- clear domains that have been on HOLD for more than 30 days
-- delete certificate from ACM (job)
Checklist
Are your changes backwards compatible? Please answer below:
Yes, everything is an addition. An exception is the way we handle S3 via localstack, now the container is called AWS and also allows ACM simulations other than S3.
On a scale of 1-10 how well and how have you QA'd this change and any features it might affect? Please answer below:
7, QA OK, domain verification handles errors correctly and follows its step, middleware correctly detects a custom domain and obtains its connected sub.
For frontend changes: Tested on mobile, light and dark mode? Please answer below:
Yes, it uses standard SN colors and styles that fits with theme changes.
Did you introduce any new environment variables? If so, call them out explicitly here:
An endpoint for localstack that can be contacted by our worker container
LOCALSTACK_ENDPOINT=http://aws:4566
local DNS server via dnsmasq, with fallback to 1.1.1.1 in both dnsmasq and code.
DOMAINS_DNS_SERVER=172.20.0.2:53