infoblox:when_things_go_wrong
Table of Contents
When Things Go Wrong
Email Outage
- External facing Linux server running BIND.
- Zone updated after a long time but Linux permissions resulted in zone file being locked and non-functional.
- Zone included MX records that were then unavailable.
- Caused Email outage.
- Linux admin team decided that they didn't want DNS service maintenance under their role so moved it to Infoblox.
Web Outage
- NIOS Hidden Primary.
- Third Party DNS provider zone transferred to publicly host zones.
- Third Party acquired by another company.
- Other company migrates systems.
- Everything works.
- Other company deletes migration systems.
- Turns out, a few customers had their TSIG keys tied up in the migration system.
- TSIG keys no longer existed.
- Zone transfers failed.
- Alerts for zone transfer failure… failed.
- After a week, the secondary servers dropped the zones that they could not longer update.
- Result - outage of critical website until customer updated public DNS to “unhide” their NIOS boxes.
infoblox/when_things_go_wrong.txt · Last modified: by bstafford
