Table of Contents
NIOS DHCP
For DHCP starvation prevention, the most common recommendation is to control/prevent it at the switch fabric layer.
The “Disable for DHCP” option in Networks is useful when you are in the process of setting up the DHCP server. Clear this after you have configured the server and are ready to have it serve DHCP for this network.
Ports
Server listens on UDP-67 T1: (direct renewal)
- UDP 68 > 67 DHCP Client to DHCP Server (throttled by Azure)
- UDP 67 > 68 DHCP Server to DHCP Client (return traffic) (blocked by Azure)
T2 Timer: (and initial
- UDP 67 > 67 DHCP Relay to DHCP Server (client to relay is broadcast)
- UDP 68 > 68 DHCP Server to DHCP Relay (client to relay is broadcast)
Performance
- Datasheet figures for DHCP LPS are based on full DORA with no ping before offer or DDNS.
- Reporting of actual LPS being served is based on ACKs to cover renews.
Moving Leases
If you update a subnet to use a different DHCP server, the existing leases will not move until they renew. If the server that issued the lease is no longer valid (because you forced it over to member2) than it shouldn't respond at the T1 timer, and should renew on the new server when it sends a discover at the T2 timer.
If the servers are in a failover pair and you are changing the balancing, you have to wait until the MCLT expires to take effect.
DHCP Hub-Spoke
A DHCP server can only be primary for a single DHCP Failover association. So you need to make each failover association have a separate primary. I.e. hub-spoke where every spoke is the primary.
Load Balancing Data
NIOS GUI has a feature where you can set load balancing data. Never touch this. Ever. It is a feature that is in ISC DHCP and is something you should never edit from equal split (50/50).
Realistically, the only other options that make any sense are 100/0 or 0/100. However, if you do this, all you are changing is the algorithm used to decide which member responds to which MAC address. The DHCP range is still shared between the two members 50/50 and that means any re-balancing will move from the 100 member to the 0 member (on a 100/0 or 0/100 configuration). This will cause a major outage.
DNS Config
To try and spread DNS load more evenly between two members, update the two DHCP members in the FO at a member level. On the first member, set the DNS assignment to “DNS Server 1 & DNS Server 2”. On the second member, set the DNS assignment to “DNS Server 2 & DNS Server 1”. So long as you don't override this at a network or range level, this means that clients using DHCP server 1 will get DNS servers in one order and clients using the DHCP server 2 will get DNS servers in the opposite order.
Failover Associations
Don't try and change the members assigned to a DHCP failover association. Messing with the FO will very likely end up in a recover/recover state which means no leases are issued.
Never rename an active failover assocation. It will trigger a recover/recover. Move all the ranges to single member then delete and recreat the FO and move all ranges to the new FO.
High Availability
HA should be used on at least one of the two members of a FO. Ideally, use HA on both because if a member goes down (HA makes this unlikly) then the other issues leases on MCLT.
Active-Active
A DHCP fixed address in an active/active range will only apply on the box that has that part of the active/active range.
So with active/active DHCP, you should define the DHCP fixed address twice - once for each DHCP servers part of the range.
VLAN Sub-Interfaces and DHCP
DHCP only works on the primary interface in NIOS. It won't run on tagged sub-interfaces (that is for DNS/NTP/Network Discovery only). e.g. f1:04:0a:7d:be:96 instead of f1040a7dbe96
Option 43
When configuring Option 43 (Vendor encapsulated options - string) for DHCP, you must enter the value with colons between the data
Binding States
- Free: The lease is available for clients to use.
- Active: The lease is currently in use by a DHCP client.
- Static: The lease is a fixed address lease. (enable under Grid DHCP Properties, General, Advanced)
- Expired: The lease was in use, but the DHCP client never renewed it, so it is no longer valid.
- Released: The DHCP client returned the lease to the appliance.
- Abandoned: The appliance cannot lease this IP address because the appliance received a response when pinging the address.
- Backup: Lease belongs to the secondary peer in a DHCP fail over relationship.
Fixed Address Lease
Under Grid DHCP Properties, General, Advanced. Fixed Address Lease. Without this feature enabled, there is no logging of leases assigned to fixed addresses and they cannot be tracked.
Primary VS Secondary
If you have a DHCP Fail-over Association (FOA), one member will be “primary” and the other “secondary”. For users, it makes no difference which is which assuming that the admin is following best-practice and using balanced distribution for the FOA. MAC address hash is used to determine which server responds.
The only difference is which member handles pool rebalancing and when you get into states of RECOVER the Primary initiates the Binding updates to create a single authoritative lease table for them to share so they can move back to NORMAL. So it is important to the state engine but not really the user/admin.
Partner Down
From here.
The scenario here is where member A is currently in the PARTNER-DOWN state, and member B, which is a replacement (and therefore has no historical knowledge of communications with A) is newly starting up. In that situation, member A should remain in PARTNER-DOWN while member B goes through RECOVER, RECOVER-WAIT, and RECOVER-DONE.
To lay it out in greater detail, after going through STARTUP and seeing that member A is in PARTNER-DOWN state, member B will go into RECOVER state, during which it will not serve clients, and will request information from member A. Once member B gets the final update message from member A (UPDDONE), it will transition to RECOVER-WAIT. Member B will remain in RECOVER-WAIT for the MCLT duration, after which it will transition to RECOVER-DONE. When member B reaches RECOVER-DONE and sees that member A is in PARTNER-DOWN, member B will then transition to the NORMAL state, and member A, seeing its peer has transitioned to NORMAL state will also transition to NORMAL.
So, as an administrator, what is your course of action? The answer is DON’T DO ANYTHING (not even a service restart), let the system sort itself out, you can just monitor the progress. As long as member A remains in PARTNER-DOWN and member B goes through the RECOVERY process (including RECOVER-WAIT and RECOVER-DONE), it is working just as designed. In almost every instance I can recall when things went wrong leading to a total DHCP outage, it was because of an action taken by an administrator. As administrators, the natural response to the system continuing to show a red indication of status is that some action must be taken in order to get back to green, and that is where people get into trouble. The waiting is the hardest part, but it is absolutely the best course of action in this scenario. If for some strange reason things don’t get back to Normal on their own after the MCLT, don’t do anything until you contact Infoblox support, and then follow their guidance exactly.
Let's say that I had a DHCP failover association between member A and member B, then let's say that member B became down for some reason and I put member A in a partner down state.
Next, I manage to get an RMA appliance to replace the failed member B, and the new member B comes up online and discovers that its peer “member A” is in partner down state.
My question is: What happens next?
Does the new member B go into a rover state while member A stays in partner down state and continues to grant leases.
Or
Do both go into a rover state, which constitue a service failover? If this is the case, how do we avoid this?
Known Clients
DHCP ISC has a setting for “Known/Unknown” clients.
Edit Range > IPv4 DHCP Options > Advanced > Allow/Deny Clients.
The problem happens when you tick BOTH boxes. Some clients get allowed.
When importing from Microsoft, it is possible that both are enabled.
DHCP Lease History
NOTE: The existence of DHCP lease history in NIOS (Data Management > DHCP > Leases > Lease History) is legacy. It pre-dates the reporting server. DHCP lease history required all DHCP members to forward lease data to a designated member. Back in the day, the recommendation was to dedicate a member to receive the logs. However, these days it is better to just use the reporting server as there is no good reason to keep doing it the old way.
The only thing the legacy system gives you that reporting won't is that it shows the data in the “Lease History” tab. However, you can get all of the same data from the reporting dashboard for DHCP Lease History, and it is far more useful. Also, the “lease history” tab is limited by count (100k) whereas the reporting server is not limited by a specific count, it depends on the indexing capacity and storage you give it, so it can be much longer.
The DHCP lease history log holds a maximum of 100,000 entries. After that maximum is reached, the appliance begins deleting entries, starting with the oldest. To archive DHCP lease history logs, you can export them and save them as CSV (comma separated variables) files. You do not need to export the entire log. You can selectively export a section of the log, such as the lease events for a single day.
As a conservative approach to archiving DHCP lease data, Infoblox recommends exporting the log on a daily basis, perhaps through API (application programming interface) scripting. By exporting the daily log entries every day over a certain period of time and then opening the exported files with a spreadsheet program, you can see the number of entries for each day. You can then estimate how often you need to export the log to ensure that you save all of the entries before the log fills up (at 100,000 entries). As a result, you might discover that you need to export the log more or less frequently than once a day to archive all the records.
DNS Suffix
For RFC3397 (DHCP option 119) that allows for DNS Suffix list, you can specify up to 1023 characters in NIOS.
NOTE: Remember, DNS suffix lists are not great. Ideally, avoid where possible and use no more than 5. They increase the load on DNS recursive resolver and also slow down endpoint machines which much try hostname queries repeatedly before concluding that a name cannot be resolved.
e.g.
"example.com", "domain.com", "hello.com"
"example.com","domain.com","hello.com"
In the lab I can get up to 442 characters before the Windows VM stops displaying (and stopped using) the suffix list provided by the NIOS appliance.
DHCP Scavenging
If DHCP scavenging is not enabled, it should be, and it definitely can reduce the lease count. If scavenging is not enabled, leases remain in the database even after they have expired.
In this community article we hear that enabling DHCP scavenging saw CPU load increase from 10% to about 30% for about 3 minutes.
Enabling “Scavenge free and backup leases” only includes Free and Backup. It does not include Expired, Abandoned or Released entries.
While you can clean up Expired/Abandoned/Released with a script, there is a valid reason for those leases to be in that state, running the script does not fix the root cause. If the root cause has been addressed, then the script can clean up a lot of objects that would otherwise persist forever.
Since NIOS 8.4 where the hidden CLI command delete leases. Don't run without talking to support first. This command should be used carefully, it can generate a LOT of writes to the database, it has a “dryrun” option to show the impacted leases before running it, and dhcp should be shut down while it is being run to ensure all abandoned leases are actually deleted.
DHCP Abandoned
When DHCP server checks IP with ping before handing IP out, if IP responds to ping from DHCP server, it is marked as abandoned by the DHCP server. Actually that state arises from when either,
- the server pings the address before assignment
- the IP doesn't respond to ping, so we give the client an in-use IP at which point virtually every DHCP client will ARP for the address. If it sees the IP in use already, the DHCP client sends a DECLINE to the DHCP server and teh DHCP server puts that address into ABANDONED state.
Leases in state Abandoned do get tried again, but only after we have run the addresses a few times looking for Free. This has an order where it looks for
- State Free w/ with your mac (i.e. you had it a long time ago)
- State Free with no mac (i.e. no one had it every according to the records)
- FREE and some other mac (i.e. someone had it a long time ago but it is free now)
Only after that would it go to abandoned and it would still ping again before giving it out, so when it responds to ping it would just be marked abandoned again.
DHCP lives inside BOOTP and as far as I know Infoblox should not have BOOTP giving out abandoned leases. BOOTP doesn't know the concept of abandoned as far as I recall. Things like gateway even vrrp/HSRP addresses should absolutely have an IP reservation. Sadly the old school MS recommendations were to put everything in the range and then exclude.
DHCP Authoritative
It is an optional configuration to mark DHCP server as authoritative and training says to always tick it. We have the ability to unselected the option because ISC DHCP says it is a configurable option but, for all practical purposes, we should always enable when NIOS the the main DHCP server. See note)
HOWEVER, you might find yourself in a migration event where you want to untick 'is authoratative'.
DNS Update on DHCP Renewal
In NIOS DHCP there is an option to “Update DNS on DHCP renewal”.
In general, leave this disabled because it generates a lot of extra DDNS updates on the DNS Primary and ZRQ transactions (data replication between members) on the GM. DDNS updates which are high-priority messages that can disrupt recursion (which is why hidden primaries are important) and GM for ZRQ transactions.
The feature does exist for a few certain use cases:
# (The main use case) When Infoblox is the DHCP server and Microsoft is the authoritative DNS (because MS uses time of last dynamic update as its sole scavenging parameter) # During transitions from non-Infoblox DNS systems to Infoblox DNS to get the devices to register in the Infoblox.WHen needed for this use case, enable only for a short while during migration (1/2 lease time). # (Not a good use case) Multi-interface clients that need accurate DDNS registration. If you have clients that have both wired and wireless interfaces, because the MAC address differs, you need to use # check-only for your TXT Record handling and have update on renew enabled, but ONLY on the subnets where it is needed, don't turn it on at a Grid level in this scenario. For this scenario, be sure it is actually necessary. There are cases where the registration of the client name matters, but it is usually more in the heads of the administrators than it is an actual technical requirement. This isn't a fix, it hides the issue by causing another. Quite a debatable point.
Problems with this feature:
# When enabled some customers can start to see issues with the Grid replication queue (ZRQ transactions). e.g. a single device (e.g. rouge VoIP phone) on the network spewing out DHCP renewals may severely impact the Grid/replication queue in some cases.
DHCP Release
When a Windows box is shutdown, it sends a DHCP Release message to the DHCP server. This can cause issues if the Windows box is a server and you are rebooting it because this may cause the DHCP server to delete the DDNS record for the server.
There is a registry key to disable this action ReleaseOnShutdown.
Better guide here
PS c:\> Get-WmiObject -Query "Select Description,GUID from Win32_Networkadapter"|Select-Object Description,GUID
Description GUID
----------- ----
Intel(R) 82579LM Gigabit Network Connection {0694A535-03B4-4F63-A9C0-44565C8A881D}
Then add the following Registry Key ReleaseOnShutdown =0
reg add HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\Tcpip\Parameters\Interfaces\{0694A535-03B4-4F63-A9C0-44565C8A881D} /v ReleaseOnShutdown /t REG_DWORD /d 0 /f
Possible other values are:
- 1 - Always release the lease
- 2 - Release or leave the lease depending of option 2 from the Server
DHCP Filters
As of NIOS 8.6.2 there is a CLI comand set dhcp_filter_behavior old/new.
The old behaviour is for DHCP filters to use OR logic.
The new behaviour is for DHCP filters to use AND logic.
For DHCP, you can use Option 77 (User Class Identifier). This allows you to define a “string” ID on the client (windows or Linux) and this will be passed onto DCHP server during request. The DHCP server can then use that in a DCHP Option filter to ensure only clients presenting that ID will get issued a lease. The use case would be where you have a subnet and range that needs to issue IP's normally but a specific sub-set should only be issued to a specific group of servers (e.g. Database servers). You would create a range for that use case and apply the option filter. Why would you want to do this? Possibly for creating known firewall rules.
DHCP Version
NIOS uses ISC DHCP. NIOS-X uses Kea.
The following is from NIOS 9.0.7
For NIOS, the version of ISC DHCP is printed to syslog when the DHCP service is restarted
- daemon
- INFO
- validate_dhcpd
- Internet Systems Consortium DHCP Server 4.3.3-P1
Troubleshooting
If you have DHCP in US and EMEA in FO, if emea site can't route to US but the source IP hash means the USA site should issue the lease, the client won't get the IP and the only way to fix this is to fix the network connectity from the EMEA site to the US
