Differences

This shows you the differences between two versions of the page.

--- infoblox_nios:upgrade [2025/04/09 14:33] – [Notes] bstafford
+++ infoblox_nios:upgrade [2026/03/19 23:06] (current) – [Upgrades to NIOS 9.1] bstafford
@@ Line 1: / Line 1: @@
 ====== NIOS Upgrade ======
+**First Rule Of Upgrading NIOS** - Read the release notes. Then read them again. Understand what changes happen with the code and figure out if this affects your deployment of NIOS. We cannot stress this single point enough.
+**First Rule Of Upgrading NIOS** - See the first rule of upgrading NIOS.
+Official upgrade documentation [[https://docs.infoblox.com/space/niosupgrade/1323696202|here]].
 ===== Notes =====
@@ Line 6: / Line 13: @@
 NOTE: When you have install a hotfix bundle/collective hot fix (e.g. CHF 8.6.3.2 ), make sure you perform a product restart (of full reboot) on the systems to ensure the fix is fully implemented. If you forget and then try to upgrade to another version of NIOS (e.g. 9.0.1) you can (albeit, very unlikely) run into [[https://support.infoblox.com/s/article/000009442|issues]].
+NOTE: From NIOS 9.0.6 onwards, upgrade status logs are captured in the Grid Master log files. You can view these logs using the ''show log debug follow /UPGRADE_STATUS/'' CLI command.
 You may need to increase the session time out limit for your user account if you are having issues uploading code to the GM prior to an upgrade. If the time out limit is too low, the time out can break the upload.
@@ Line 30: / Line 40: @@
   * Check DHCP FO state (if DHCP used)
   * Check CPU and RAM (RAM usage will 'appear' to increase when going from 8.6 to 9.0 because page files are represented as used RAM).
+  * If you use the DNS Forwarding Proxy (DFP) or you have linked the GM/GMC to the Infoblox Portal, make sure that they are all showing as healthy in the Infoblox Portal. If they are not healthy, there may be a communication problem and that may cause problems after upgrade.
   * Check your account on the Infoblox Support Portal. Make sure that the phone number listed is correct and works internationally. In many cases, support try and contact you on this number but can't get through because the number is listed incorrectly.
   * Check reporting server to see what the current usage trends are (e.g. is DNS traffic distributed equally across all DNS servers, etc)
   * If you are using ILOM, check that it works. (Physical appliances only)
   * Raise a preemptive support ticket. Also upload a small file to show that you can (some customers have traffic inspection security systems that can interfere with the upload mechanism)
-  * Read release notes - SERIOUSLY, read them carefully. This is where you will find details of changes to default behaviour, notes for upgrades, etc.
+  * Read release notes - SERIOUSLY, read them carefully. This is where you will find details of changes to default behaviour, notes for upgrades, etc. Official upgrade documentation is now [[https://docs.infoblox.com/space/niosupgrade/1323696202|here]].
   * Have a set of tests for validating services before and after upgrade (e.g. DNS recursion, DHCP, etc)
-  * Where possible, upload, distribute and test the upgrade BEFORE the actual change window. This reduces the risk of issues impacting the upgrade. (e.g. code refusing to distribute because of a configuration error or the test failing because of a configuration error)
+  * Where possible, upload, distribute and test the upgrade BEFORE the actual change window. Ideally two or more weeks before the upgrade window if you have a lot of process for change control. This gives you time to get support for any issues with those steps. This reduces the risk of issues impacting the upgrade. (e.g. code refusing to distribute because of a configuration error or the test failing because of a configuration error). Infoblox users have had change windows run out of time when they encountered issues at the Distribute or Test stage and didn't have enough time to get to the root of the problem and fix it (which meant having to schedule another change window).
   * If you are running any Grid member as a virtual appliance, make sure that you have access to the console of that VM (e.g. VMware, AWS, etc). If you do not have access, make sure you know who does and that they are available during the upgrade window. Scenario: member goes down for a reboot after upgrade and doesn't come back. You will need console access to see what is wrong and engage support and/or just reboot the appliance). What if you have to re-deploy the VM, do you know how?
-  * If you are running any Grid member as a physical appliance, make sure that you know exactly where it is physically located (site, room, rack, U, etc). Make sure you have easy access to it (e.g. ore-request a data center access pass 'just-in-case') or make sure you know what local-hands are available to access the device. e.g. if it doesn't come back after a reboot, physically rebooting may be necessary and using a console cable to read off the console may also be necessary).
+  * If you are running any Grid member as a physical appliance, make sure that you know exactly where it is physically located (site, room, rack, U, etc). Make sure you have easy access to it (e.g. pre-request a data center access pass 'just-in-case') or make sure you know what local-hands are available to access the device. e.g. if it doesn't come back after a reboot, physically rebooting may be necessary and using a console cable to read off the console may also be necessary).
+  * If you have any Grid Member connected to Infoblox cloud (e.g. GM syncing data or a member with DFP installed), you MUST ensure that the servers are showing as healthy in the Infoblox portal. DFP and NOA (connection between NIOS and Infoblox Portal) are containers and not part of NIOS itself. This means that when the NIOS upgrade image is pushed to the passive partition, it doesn't contain the containers. When NIOS boots to the new version of NIOS, it will need to connect to the Infoblox Portal and download the container images. If NIOS can't resolver DNS using the server specified in Grid/Member setting for "Infoblox Portal Configuration" (also called CSP Configuration in earlier versions of NIOS), then it can't connect to the Infoblox Portal. Even if it can resolve the domains needed, it then needs to access Infoblox Portal on TCP-443 or via a web proxy. The best place to check this is in the Infoblox Portal itself to see if the NIOS server is showing as healthy. (connection failure means, for example, that DFP would not work after upgrade because the container couldn't be retrieved from Infoblox Portal).
 ===== Downgrades =====
@@ Line 44: / Line 56: @@
 After you complete the downgrade procedure, all data in the database is lost. The downgrade process does not preserve data but does preserve license information and basic network settings.
+===== Upgrades to NIOS 9.1 =====
+SSH into GM and disable TLS 1.0 and TLS 1.1
+<code>set ssl_tls_settings override
+set ssl_tls_protocols disable TLSv1.0
+set ssl_tls_protocols disable TLSv1.1</code>
+You will need to restart the GUI manually. Navigate to the Grid tab -> Grid Manager tab -> Members tab, select the member checkbox, expand the Toolbar, and click Control -> Restart GUI
+You may also get the following error logs in the GM syslog based on one or more of the Trusted Root CA in your CA store in NIOS
+<code>Upgrade check failed, SKI doesn't exist in CA-certificate subject=</code>
 ===== Upgrades to NIOS 9.0 =====
@@ Line 50: / Line 73: @@
 You should install Hotfix-NIOS-98022 BEFORE upgrading to NIOS 9.0 (but AFTER distribution of NIOS 9.0.x code) to ensure that all OpenVPN connections (Grid communication) is using a correct certificate. Failure to do this can result in members going offline (not connecting to GM) and/or GM entering a reboot loop. From NIOS 9.0.6 onwards, Upgrade Test and Upgrade will fail if OpenVPN certificates are not correct. More details [[https://support.infoblox.com/s/article/How-to-recover-NIOS-from-old-certificate-related-issues|here]].
+Consider setting the following after upgrading to 9.0 to ensure that DNS restarts don't take longer. named_max_exit_wait - default is to wait until exit happens. This command sets a max (e.g. 3 or 5 seconds)
 In NIOS 9.0 and higher, if you use LDAP authentication and you need the LDAP connection to egress the MGMT interface, you must put a static route on the NIOS box to force the traffic to use the MGMT interface.  This is because in NIOS 9.0.0, LDAP requests to the LDAP server and Active Directory server cannot be sent using the MGMT IP address, because OpenLDAP version 2.4.49 (Ubuntu) removed the options of binding the source IP address on the client. Therefore, an LDAP request or an Active Directory authentication request is always sent through the LAN IP address, even though you have enabled the Connect through Management Interface option.
@@ Line 133: / Line 159: @@
 The following command is available from NIOS 9.0 onwards
+<code>set enable_strict_ca_cert_check</code>
 <code>set disable_strict_ca_cert_check</code>
 <code>show strict_ca_cert_check</code>
@@ Line 197: / Line 224: @@
 Note: Using the command will force all upgrade groups to end upgrade immediately, all incomplete groups members will be logged-off the grid to perform an auto-sync of software with the grid this operation should only be used in an emergency situation to end a scheduled upgrade as it will result in member service outage until the operation is completed.
+During an upgrade, you have the option to select an upgrade group and click "Upgrade Now". This will tell NIOS to start the upgrade on all members of the group simultaneously and immediately.
+-- Note from Infoblox Community user:
+Probably my bad, but when an upgrade group is set to sequential it does not mean the node will upgrade one after the other i.e. waiting until one has finished to start the next upgrade….it means that the node upgrades get kicked off a minute or so apart from each other, so there is a huge overlap
+This caused downtime because several node which are each other fallback to be offline at the same time.
+But worse in my opinion, even HA-clusters now have a downtime during the failover from the node running the old version to the node running the newly installed version.
+It seems that the node running the old version starts the failover as soon as it detects the other node running a higher version, but does not take in to account that this new node is not yet ready to handle traffic. So the old node goes offline and the new is still in a slow process of starting BIND. This resulted in a down time for DNS of 3 to 5 minutes.
+if any grid member fails to upgrade within 10 minutes, the next one goes.
 ===== Automating Upgrades =====
 Upgrades can be automated via API.