infoblox_nios:upgrade
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| infoblox_nios:upgrade [2025/04/09 14:33] – [Notes] bstafford | infoblox_nios:upgrade [2026/03/19 23:06] (current) – [Upgrades to NIOS 9.1] bstafford | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== NIOS Upgrade ====== | ====== NIOS Upgrade ====== | ||
| + | |||
| + | **First Rule Of Upgrading NIOS** - Read the release notes. Then read them again. Understand what changes happen with the code and figure out if this affects your deployment of NIOS. We cannot stress this single point enough. | ||
| + | |||
| + | **First Rule Of Upgrading NIOS** - See the first rule of upgrading NIOS. | ||
| + | |||
| + | Official upgrade documentation [[https:// | ||
| + | |||
| ===== Notes ===== | ===== Notes ===== | ||
| Line 6: | Line 13: | ||
| NOTE: When you have install a hotfix bundle/ | NOTE: When you have install a hotfix bundle/ | ||
| + | |||
| + | |||
| + | NOTE: From NIOS 9.0.6 onwards, upgrade status logs are captured in the Grid Master log files. You can view these logs using the '' | ||
| You may need to increase the session time out limit for your user account if you are having issues uploading code to the GM prior to an upgrade. If the time out limit is too low, the time out can break the upload. | You may need to increase the session time out limit for your user account if you are having issues uploading code to the GM prior to an upgrade. If the time out limit is too low, the time out can break the upload. | ||
| Line 30: | Line 40: | ||
| * Check DHCP FO state (if DHCP used) | * Check DHCP FO state (if DHCP used) | ||
| * Check CPU and RAM (RAM usage will ' | * Check CPU and RAM (RAM usage will ' | ||
| + | * If you use the DNS Forwarding Proxy (DFP) or you have linked the GM/GMC to the Infoblox Portal, make sure that they are all showing as healthy in the Infoblox Portal. If they are not healthy, there may be a communication problem and that may cause problems after upgrade. | ||
| * Check your account on the Infoblox Support Portal. Make sure that the phone number listed is correct and works internationally. In many cases, support try and contact you on this number but can't get through because the number is listed incorrectly. | * Check your account on the Infoblox Support Portal. Make sure that the phone number listed is correct and works internationally. In many cases, support try and contact you on this number but can't get through because the number is listed incorrectly. | ||
| * Check reporting server to see what the current usage trends are (e.g. is DNS traffic distributed equally across all DNS servers, etc) | * Check reporting server to see what the current usage trends are (e.g. is DNS traffic distributed equally across all DNS servers, etc) | ||
| * If you are using ILOM, check that it works. (Physical appliances only) | * If you are using ILOM, check that it works. (Physical appliances only) | ||
| * Raise a preemptive support ticket. Also upload a small file to show that you can (some customers have traffic inspection security systems that can interfere with the upload mechanism) | * Raise a preemptive support ticket. Also upload a small file to show that you can (some customers have traffic inspection security systems that can interfere with the upload mechanism) | ||
| - | * Read release notes - SERIOUSLY, read them carefully. This is where you will find details of changes to default behaviour, notes for upgrades, etc. | + | * Read release notes - SERIOUSLY, read them carefully. This is where you will find details of changes to default behaviour, notes for upgrades, etc. Official upgrade documentation is now [[https:// |
| * Have a set of tests for validating services before and after upgrade (e.g. DNS recursion, DHCP, etc) | * Have a set of tests for validating services before and after upgrade (e.g. DNS recursion, DHCP, etc) | ||
| - | * Where possible, upload, distribute and test the upgrade BEFORE the actual change window. This reduces the risk of issues impacting the upgrade. (e.g. code refusing to distribute because of a configuration error or the test failing because of a configuration error) | + | * Where possible, upload, distribute and test the upgrade BEFORE the actual change window. Ideally two or more weeks before the upgrade window if you have a lot of process for change control. This gives you time to get support for any issues with those steps. This reduces the risk of issues impacting the upgrade. (e.g. code refusing to distribute because of a configuration error or the test failing because of a configuration error). Infoblox users have had change windows run out of time when they encountered issues at the Distribute or Test stage and didn't have enough time to get to the root of the problem and fix it (which meant having to schedule another change window). |
| * If you are running any Grid member as a virtual appliance, make sure that you have access to the console of that VM (e.g. VMware, AWS, etc). If you do not have access, make sure you know who does and that they are available during the upgrade window. Scenario: member goes down for a reboot after upgrade and doesn' | * If you are running any Grid member as a virtual appliance, make sure that you have access to the console of that VM (e.g. VMware, AWS, etc). If you do not have access, make sure you know who does and that they are available during the upgrade window. Scenario: member goes down for a reboot after upgrade and doesn' | ||
| - | * If you are running any Grid member as a physical appliance, make sure that you know exactly where it is physically located (site, room, rack, U, etc). Make sure you have easy access to it (e.g. ore-request a data center access pass ' | + | * If you are running any Grid member as a physical appliance, make sure that you know exactly where it is physically located (site, room, rack, U, etc). Make sure you have easy access to it (e.g. pre-request a data center access pass ' |
| + | * If you have any Grid Member connected to Infoblox cloud (e.g. GM syncing data or a member with DFP installed), you MUST ensure that the servers are showing as healthy in the Infoblox portal. DFP and NOA (connection between NIOS and Infoblox Portal) are containers and not part of NIOS itself. This means that when the NIOS upgrade image is pushed to the passive partition, it doesn' | ||
| ===== Downgrades ===== | ===== Downgrades ===== | ||
| Line 44: | Line 56: | ||
| After you complete the downgrade procedure, all data in the database is lost. The downgrade process does not preserve data but does preserve license information and basic network settings. | After you complete the downgrade procedure, all data in the database is lost. The downgrade process does not preserve data but does preserve license information and basic network settings. | ||
| + | |||
| + | ===== Upgrades to NIOS 9.1 ===== | ||
| + | SSH into GM and disable TLS 1.0 and TLS 1.1 | ||
| + | |||
| + | < | ||
| + | set ssl_tls_protocols disable TLSv1.0 | ||
| + | set ssl_tls_protocols disable TLSv1.1</ | ||
| + | You will need to restart the GUI manually. Navigate to the Grid tab -> Grid Manager tab -> Members tab, select the member checkbox, expand the Toolbar, and click Control -> Restart GUI | ||
| + | |||
| + | You may also get the following error logs in the GM syslog based on one or more of the Trusted Root CA in your CA store in NIOS | ||
| + | < | ||
| ===== Upgrades to NIOS 9.0 ===== | ===== Upgrades to NIOS 9.0 ===== | ||
| Line 50: | Line 73: | ||
| You should install Hotfix-NIOS-98022 BEFORE upgrading to NIOS 9.0 (but AFTER distribution of NIOS 9.0.x code) to ensure that all OpenVPN connections (Grid communication) is using a correct certificate. Failure to do this can result in members going offline (not connecting to GM) and/or GM entering a reboot loop. From NIOS 9.0.6 onwards, Upgrade Test and Upgrade will fail if OpenVPN certificates are not correct. More details [[https:// | You should install Hotfix-NIOS-98022 BEFORE upgrading to NIOS 9.0 (but AFTER distribution of NIOS 9.0.x code) to ensure that all OpenVPN connections (Grid communication) is using a correct certificate. Failure to do this can result in members going offline (not connecting to GM) and/or GM entering a reboot loop. From NIOS 9.0.6 onwards, Upgrade Test and Upgrade will fail if OpenVPN certificates are not correct. More details [[https:// | ||
| + | |||
| + | Consider setting the following after upgrading to 9.0 to ensure that DNS restarts don't take longer. named_max_exit_wait - default is to wait until exit happens. This command sets a max (e.g. 3 or 5 seconds) | ||
| + | |||
| In NIOS 9.0 and higher, if you use LDAP authentication and you need the LDAP connection to egress the MGMT interface, you must put a static route on the NIOS box to force the traffic to use the MGMT interface. | In NIOS 9.0 and higher, if you use LDAP authentication and you need the LDAP connection to egress the MGMT interface, you must put a static route on the NIOS box to force the traffic to use the MGMT interface. | ||
| Line 133: | Line 159: | ||
| The following command is available from NIOS 9.0 onwards | The following command is available from NIOS 9.0 onwards | ||
| + | < | ||
| < | < | ||
| < | < | ||
| Line 197: | Line 224: | ||
| Note: Using the command will force all upgrade groups to end upgrade immediately, | Note: Using the command will force all upgrade groups to end upgrade immediately, | ||
| + | During an upgrade, you have the option to select an upgrade group and click " | ||
| + | |||
| + | |||
| + | -- Note from Infoblox Community user: | ||
| + | Probably my bad, but when an upgrade group is set to sequential it does not mean the node will upgrade one after the other i.e. waiting until one has finished to start the next upgrade….it means that the node upgrades get kicked off a minute or so apart from each other, so there is a huge overlap | ||
| + | |||
| + | This caused downtime because several node which are each other fallback to be offline at the same time. | ||
| + | |||
| + | But worse in my opinion, even HA-clusters now have a downtime during the failover from the node running the old version to the node running the newly installed version. | ||
| + | |||
| + | It seems that the node running the old version starts the failover as soon as it detects the other node running a higher version, but does not take in to account that this new node is not yet ready to handle traffic. So the old node goes offline and the new is still in a slow process of starting BIND. This resulted in a down time for DNS of 3 to 5 minutes. | ||
| + | |||
| + | if any grid member fails to upgrade within 10 minutes, the next one goes. | ||
| ===== Automating Upgrades ===== | ===== Automating Upgrades ===== | ||
| Upgrades can be automated via API. | Upgrades can be automated via API. | ||
infoblox_nios/upgrade.1744209230.txt.gz · Last modified: by bstafford
