Differences

This shows you the differences between two versions of the page.

--- infoblox_nios:dns [2025/04/29 22:17] – bstafford
+++ infoblox_nios:dns [2025/12/18 11:00] (current) – [External DNS] bstafford
@@ Line 24: / Line 24: @@
 ===== Concurrent Queries =====
 "Limit number of recursive clients to". Max setting is 40,000 regardless of model type. This is very high. If you are hitting logs saying that your limit is exceeded and you limit is over 15,000, something else is likely wrong unless you are an ISP.
+Docs [[https://docs.infoblox.com/space/nios90/280665882/Enabling+Recursive+Queries#Restricting-Recursive-Client-Queries|here]].
 ===== PTR Reverse Zones =====
@@ Line 73: / Line 75: @@
 ===== General Design =====
-If CHR is 40%+ look for top offenders (e.g. SIEM or Mailserver) and maybe give them their own DNS servers or turn off feature on the offending server (e.g. stop PTR lookups)
+If CHR is 40% or less look for top offenders (e.g. SIEM or Mailserver) and maybe give them their own DNS servers or turn off feature on the offending server (e.g. stop PTR lookups)
+Bind  - max recursive clients is 1,000. Increase to 5,000. If you have to increase this, never increase more than the max QPS. If you have to increase beyong 5,000 (max is 40,000) either the QPS has massively increased or the CHR has fallen. Find the top offenders using the reporting server. The reporting server doesn't need query logging to give top query clients and top query domains.
+If BIND isn't running, NTP isn't available on anycast. NTP just runs on all operationals interfaces and route withdrawal is tied to BIND service. Thus, no BIND, no NTP on anycast address.
+Global Forwarders. Don't use more than 4. Possibly 6 at a push.
+Don't forget that you mustn't assume that the OS's will always use the first DNS server and will only try the second or third if the first one fails.
+Three DNS servers can be configured on Linux. THree DNS servers can be configured on Windows using DHCP.
-Bind  - max recursive clients is 1000. If you have to increase this, never increase more than the max QPS.
 ===== Disable Cache=====
 You can't disable cache on NIOS but you can set TTL to 0 under DNS Grid Properties > General > Advanced > MAx Cache TTL (Set to 0).
@@ Line 81: / Line 93: @@
 When showing "capacity" of a member, you may see entry "bind_tombstone".
 The zone-maintenance phase of NIOS's OneDB's AZD (augmented zone data) handling creates a timestamped tombstone record when a DNS record in a multi-master zone is locally deleted or is deleted by DB replication on the Grid Manager.
+===== MS DNS =====
+When migrating from Microsoft to NIOS, ensure you migrate the forrestdnszones and domaindnszone
 ===== Anycast =====
 Remember, for Anycast, you need to setup the Anycast IP on the member, then edit the member's DNS properties and configure it to accept queries on the Anycast IP (General > Basic tab). It may then take a minute to apply.
+The default values  for BGP timers as per RFC 4271 is Keepalive=60 seconds and Hold Timer=180 seconds, as Hold timer is recommended to be 3x of the Keepalive per RFC.
+Consider using 4 seconds for Keepalive and 16 seconds for Hold Timer as the industry recommendation for faster convergence (e.g. in data centres and high-performance networks) is between 3-10 for keep alive and hold timer between 9-30 (3x of keep alive).
+The higher values are usually kept in consideration for stable, low-bandwidth or high-latency networks (usually for long-distance peerings).
@@ Line 91: / Line 115: @@
   * The NIOS application does not support a route flap. For example, temporary DNS downtime such as restart, does not stop or re-instate the OSPF advertisement.
   * The OSPF advertisement stops if DNS service is down for more than 40 seconds.
+BGP is preferable over OSPF as if can be more finely manipulated.
+<code>show bgp config</code>
+<code>show ospf config</code>
+<code>show bgp [ route | neighbor | summary | config ]</code>
+<code>show ospf [ interface | neighbor | database | route | config ]</code>
+<code>show bfd</code>
 Query logs will show the Anycast IP the query was aimed at
@@ Line 100: / Line 132: @@
 When using Anycast you should also enable BFD and enable the DNS Health Check Monitor ([[https://docs.infoblox.com/space/nios90/281182312/Enabling+and+Disabling+DNS+Health+Check+Monitor|documentation]] and [[https://docs.infoblox.com/space/nios85/35881713/About+BFD+(Bidirectional+Forwarding+Detection)#Enabling-and-Disabling-DNS-Health-Check-Monitor|documentation]]).
+HA and Anycast are not mutually exclusive. You might want to HA DNS Anycast boxes to assist with NIOS upgrades. Also, it improves Geo resiliancy. If you have a HA box in USA, EMEA, and APAC, anycast will keep DNS available but a failure in one geo means clients in that geo now have higher DNS latency. Of course, if you are doing HA, it means you could deploy two standalone boxes per geo to reduce that risk. However, you should architect that a single box per geo can cope with the load of the geo (and at least one other).
+If you only have two or three DNS servers, don't bother with anycast (probably). Just specify two or three DNS servers to the clients. Linux and Windows can handle this (use DHCP if required).
+Don't put a second anycast IP on all the same Anycast servers. E.g. Say you had two DNS servers in AMER, two in EMEA, and two in APAC. You would put one Anycast IP on the first server in every GEO and a second Anycast IP on the second server in every GEO. Make sure that the two Anycast IP addresses cannot be summerised in teh same route internally. Exactly how far appart the two IP's should be depends on how far you summarise the routes internally.
+If you have one anycast IP and want to use a secondary IP that is non-anycast as backup, make sure that the secondary IP is NEVER in the same DC as the endpoint's given the secondary IP. For example, if you have three datacenters, if you have endpoints in DC1 that use the DC1 local DNS IP and the Anycast IP, then if anything happens to the DNS server in DC1, routing won't update immediately (BFD can help keep route converging to a few seconds) so both the primary and secondary DNS servers queried will both go to the (faulty) DNS server in DC1. Causing an outage for endpoints in DC1. This is why DC1 should have Anycast + a local IP from a DC in another data center.
+Some notes on restart times:
+  * With NIOS 8.6.x BIND waited 5 seconds for idle tasks to disappear and 20 seconds for active task before it forces a restart
+  * With NIOS 9.0.x BIND does a graceful restart and doesn’t quit until all references have been released.
+  * This results in a much longer restart of named
+  * But, Starting with NIOS 9.0.3-CHF3 and later, we can change the behavior of BIND to match the NIOS 8.6.x behavior:
+    * You will need to SSH to the Grid and login with your credentials
+    * You will enter the NIOS CLI where you can execute the following command:
+    * set named_max_exit_wait 5
+    * With this configuration change the BIND restart behavior has changed and new named restarts will be faster to avoid the long dns restart and the long DNS service disruption
+    * Note: look for the message "all zones loaded" to see when BIND has fully restarted.
 ===== LAN2 =====
 To get a NIOS appliance to receive DNS queries on LAN1 but then send queries (i.e. recursion) on LAN2 (e.g. in a bridged DMZ where LAN1 = internal network and LAN2 = external network), then under the member properties in Grid go to General > Basic and toggle "Send queries from" LAN2 interface. And "Send notify messages and zone transfer requests from" LAN2 interface.
@@ Line 106: / Line 159: @@
 ISP's might implement this to help mitigate (i.e. continue with cache responses in case of massive Authoritative failure) the end user impact of incidents such as the [[https://en.wikipedia.org/wiki/2021_Facebook_outage|Facebook BGP/DNS outage]] in November 2021.
+===== TCP Client Limit =====
+TCP DNS is "more expensive" than UDP DNS, with session stand-up/tear-down, but it is no where close to the resource needed for DoT/DoH.
+Max number of TCP DNS clients is 1,000 by default and this is enough for a lot of organizations. 25k is the max you can set it to.
+You may need to change quota for TCP clients in two parts (assuming NIOS 9)
+  - adjusting the named_tcp_clients_limit
+  - ensure that there are enough sockets available. By default (again, NIOS 9), the number of sockets is 21,000 and thus your adjustment will be in range. Unfortunately, the value for sockets is dictated by recursive client quote and is recursive client quote + 1,000, except when at 1,000 where it's + 20,000. If the recursive client quota has been adjusted, and there aren't going to be enough sockets, there's also a command to adjust the max sockets that DNS can use ''set named_tcp named_max_socket N''.  Note that you should ONLY adjust the ''named_max_socket'' value if you know the value is too low; check the logs when named starts to see how many sockets are being allocated -- if it's less than recursive_client_quota + named_tcp_clients_limit + 1,000, it's too small.
 ===== External DNS =====
-To hide private IP of LAN1 interface when NIOS is externally facing,
+To hide private IP of LAN1 interface when NIOS is externally facing (e.g. in Azure),
-Data Management->DNS->Members, edit member, Views.
+Data Management->DNS->Members, edit member, (advanced) > DNS Views > (basic tab).
-Click on "Interface IP Address" for the view, change it to "Other IP Address", then type in the IP you want published for glue in the view for the member.
+In the appropriate View, click on "Interface IP Address" for the view (it doesn't look 'clickable' until you actually click it), change it to "Other IP Address", then type in the IP you want published for glue in the view for the member. In this case it is likely to be the public IP of the DNS server. This will automatically update the SOA record as well as the IP addresses for the associated NS and A records.
 Or you can make the NIOS entries in the Name Server Group to be "Stealth" and then add the external IP addresses as External Secondaries.
+Remember, if you have a third party DNS transferring from your NIOS external DNS servers, if the Grid Primary goes offline, the Grid Secondary will still get updated (via Grid Transfer). Enable Grid secondaries to notify external secondaries.
+Data Management > DNS > Grid DNS Properties > General > Advanced >
+  * [[https://docs.infoblox.com/space/nios90/281182286/Notifying+External+Secondary+Servers|Enable Grid secondaries to notify external secondaries]]: This option is enabled by default.
+  * Notify Delay: Specify the number of seconds that the Grid secondary servers delays sending notification messages to the external secondaries. The default is five seconds.
+"Enable grid secondaries to notify external secondaries: Select this check box to allow secondary name servers in a Grid to send notify messages to secondary name servers outside the Grid. Enabling this option increases the number of notify messages; however, it ensures that an external secondary name server receives notify messages when its master is a secondary name server in a Grid."
 ===== DNS Views =====
 Multiple views on a member, fine. Looping/Forwarding between views is not fine. Possible and, in some cases, necessary, but not fine. It also means that the NIOS member probably cannot use itself as a resolver because it will often match the "wrong" view.
 ===== Alias =====
 Remember, when putting an alias record on an authoritative DNS server (i.e. CNAME for APEX of domain), the DNS server will need to be able to resolve the place it is pointing to. This does not mean you have to enable recursion on the DNS server but the server itself will need to resolve the name. (e.g. management layer)
+===== ACL =====
+Access Control List. If you have response logging enabled and a query comes in that gets rejected because of an ACL, you will get a log of the "REFUSED" response (but no indication it was because of an ACL).
+===== GSS-TSIG=====
+When you have multiple AD domains in a Forest, you need to delegate the underscore zones and then enable GSS-TSIG updates at each delegated zone. This is exactly the same way it works in MSFT DNS. You CAN, but should NOT, enable GSS-TSIG on the "parent" AD zone. MSFT does this by default. It's a best practice to NOT allow that. Instead, you would use ACLs from server networks to allow servers to do updates. All client updates should be done only by the DHCP server or by some form of automation. If clients are doing the update, they can only update their own zone based on the AD domain the system belongs to. It's not possible for one client to update a different AD domain since (1) it's credentials won't allow it and (2) it only uses the domain name for which it is a member.
 ===== Forwarders =====
 There is no single answer to the question "How long will NIOS take to fall back to root hints once a global forwarder fails". It depends on how many forwarders are configured. More forwarders means more servers to try before failing back to root. It also depends on BIND version but the more modern BIND uses RTT which effects overall time, and finally there are mechanics at play as well for EDNS0 back off where it will try increasing Timeouts (last i read something like 1.6s, 3.2s, 6.4s 9s (until it hits the default max which i recall as 30s total).
@@ Line 155: / Line 231: @@
 If you need to increase, do so 1k at a time.
+[[https://docs.infoblox.com/space/nios90/280665882/Enabling+Recursive+Queries|Documentation page]]
+Recursion client quota as printed in syslog
+<code>
+Recursion client quota: used/max/soft-limit/s-over/hard-limit/h-over/low-pri = 19005/23288/29900/0/30000/0/19005
+/used	/max	/soft-limit	/s-over	/hard-limit	/h-over	/low-pri
+/21415	/24100	/29900		/0	/30000		/0	/21415</code>
 [[https://community.infoblox.com/t5/trending-kb-articles/support-central-kb-118-what-does-quot-no-more-recursive-clients/ba-p/6321|KB Article]] - If you want to increase the number of outstanding recursive queries on the recursive name server, confirm that you have adequate memory available for that number of outstanding recursive queries and for other services that are configured on the same server. Every recursive query can take about 20 kilobytes.