User Tools

Site Tools


infoblox:when_things_go_wrong

When Things Go Wrong

Email Outage

  • External facing Linux server running BIND.
  • Zone updated after a long time but Linux permissions resulted in zone file being locked and non-functional.
  • Zone included MX records that were then unavailable.
  • Caused Email outage.
  • Linux admin team decided that they didn't want DNS service maintenance under their role so moved it to Infoblox.

Web Outage

  • NIOS Hidden Primary.
  • Third Party DNS provider zone transferred to publicly host zones.
  • Third Party acquired by another company.
  • Other company migrates systems.
  • Everything works.
  • Other company deletes migration systems.
  • Turns out, a few customers had their TSIG keys tied up in the migration system.
  • TSIG keys no longer existed.
  • Zone transfers failed.
  • Alerts for zone transfer failure… failed.
  • After a week, the secondary servers dropped the zones that they could not longer update.
  • Result - outage of critical website until customer updated public DNS to “unhide” their NIOS boxes.
infoblox/when_things_go_wrong.txt · Last modified: by bstafford