====== Troubleshooting NIOS ====== When downloading a Support Bundle from a NIOS HA member, the bundle will include a bundle from each member of the HA pair (assuming both nodes are online) show tech-support ===== Can't Join Grid===== The following is mostly caused by the member trying to join not having an active NIOS licence. Member type mismatch with Grid Master ===== RMA Depots ===== [[https://insights.infoblox.com/resources-datasheets/Infoblox-Global-RMA-Depot-Locations|List]] of Infoblox Depots. [[https://www.infoblox.com/company/legal/terms-premium-maintenance/|In order to ship the same day]], the RMA must be processed by 3:00 pm local time of the depot processing the RMA for shipment. ===== CLI Modes===== set maintenancemode set maintenancemode on set maintenancemode off set expertmode set expertmode on set expertmode off ===== NIOS Hardware Status ===== Example output of ''show hardware_status'' hostname > show hardware_status CPU_TEMP: 29 C SYS_TEMP: 27 C POWER: Power #1 OK TYPE:AC FRU-ID:PWS-606P-1R SN:P606PCL03VV4222 POWER: Power #2 OK TYPE:AC FRU-ID:PWS-606P-1R SN:P606PCL03VV4333 FAN1: 7400 FAN2: 7600 FAN3: 7500 FAN4: 7500 FAN5: 7600 FAN6: 7500 RAID_ARRAY: OPTIMAL RAID_DISK1: ONLINE, IB-Type14 RAID_DISK2: ONLINE, IB-Type14 RAID_DISK3: ONLINE, IB-Type14 RAID_DISK4: ONLINE, IB-Type14 RAID_BATTERY: RAID battery OK hostname > show hardware-type Member hardware type: IB-2225 hostname > show version Version : 8.5.4-419474 SN : 2205202223333123 Hotfix : N/A ===== TCPDUMP ===== From the [[https://community.infoblox.com/t5/nios-dns-dhcp-ipam/testing-infoblox-grid-connectivity-by-hand-before-joining-new/td-p/18626|community site]]. set expertmode on tcpdump -i eth2 (udp port 2114 || udp port 1194) && src && host * eth0 = mgmt * eth1 = LAN * eth2 = HA * eth4 = LAN2 To list interfaces show interface To list all interfaces quickly show interface_mtu To capture traffic on a server (192.168.11.153) where the client (1921.68.99.74) is accessing TCP-443 on the server. tcpdump -i eth1 -n '(src 192.168.99.74 and dst 192.168.11.153 and dst port 443) or (src 192.168.11.153 and dst 192.168.99.74 and src port 443)' ===== Automated Traffic Capture ===== Traffic Capture can be automated on events. [[https://docs.infoblox.com/space/nios90/280760742/Enabling+Automated+Traffic+Capture|docs]] ===== Hardware===== (Don't try this without support) * RAID controller setup screen says “No Configuration Present !”. * Remove the Power Supplies * Press F2 * Import the "Foreign Config" * Removed the power supplies again * At next bootup, NIOS runs fsck and recovers the DB ===== Downloading Logs ===== To get logs: * Download support bundle. They current log file and the 9 previous rolled logfiles will be in there. * You can also use the "get_log_files" WAPI call. * You can also download from Administration > Logs > Syslog > Download. * You can also download from Administration > Logs > Syslog > Export (which will honour any filters applied). ===== Joining Grid ===== Remember, when you try and join a grid, if the Grid name is wrong, the GM will log an error. If the Grid name is correct but the shared secret is wrong, there will be no log. The new member will just silently fail to join. Also, if you try and join a Grid and tell the joining member to use the MGMT port, if the Grid configuration hasn't got Grid > Grid Manager > Members > [Member] Network > Advanced > "Enable VPN on MGMT Port" ticked, then the member won't be able to join. If you want to use the MGMT port to connect, you have to tick "Enable VPN on MGMT Port" first in the GM configuration for that member. ===== Show Port is Open ===== set maintenacemode Maintenance Mode > show network_connectivity proto udp x.x.x.x 1194 set maintenacemode Maintenance Mode > show network_connectivity type 4 proto tcp x.x.x.x 22 Starting Nmap 7.80 ( https://nmap.org ) at 2025-09-10 09:27 UTC Nmap scan report for x.x.x.x Host is up (0.00091s latency). PORT STATE SERVICE 22/tcp open ssh Nmap done: 1 IP address (1 host up) scanned in 0.03 seconds Maintenance Mode > In addition, the following is a bit of a hacky way of showing connectivity. You can you misuse the dig command in expertmode to test the OpenVPN Ports - Grid Member: dig -v #1194 @ -p 1194 dummy.domain dig -v #2114 @ -p 2114 dummy.domain This will send a UDP packet from the specified interface IP address and local port to the specified port on the Gridmaster VIP. Grid Master: To verify the incoming packet, you will need to start a traffic capture or tcpdump on the CLI. ===== Traceroute ===== traceroute -U -s GMCip memberIP-p 2114 -n -w 0.75 -f 30-q 1 traceroute -U -s GMCip memberIP-p 1194 -n -w 0.75 -f 30-q 1 ===== SCP Support Bundle ===== SSH into appliance and transfer off to local SCP/FTP server set transfer_supportbundle [ftp|scp] [dest ] [core_files] [current_logs] [rotated_logs] [[https://support.infoblox.com/s/article/1287|details]] ===== IBAP debug log ===== Turn this off after use to prevent excessive logging in Grid Master. Grid Master CLI> set debug ibap on Grid Master CLI> set debug ibap off ===== Disk Issues ===== [[misc#disk|Info on disk size]] show disk_usage_sorted config Expert Mode > set maintenancemode Maintenance Mode > show cores Maintenance Mode > show logfiles Maintenance Mode > show backup [ grid | dtc ] then Maintenance Mode > delete Synopsis: delete [ backup | cores | file ] Description: delete backup : Delete the specified backup file(s) delete cores : Delete the specified core file(s) delete cores all : Delete all the core dumps delete file : Delete the selected file(s) Maintenance Mode > ===== Force HA DB Sync ===== Force Sync from both nodes of a HA pair of the active node isn't syncing to the passive node * Log in to active node and run ''set maintenancemode'' and then ''set debug_toold db_sync'' * Log in to the passive node and run the same commands If still issue persist, we need to enable db_dump and db_queue on both the nodes and Grid master and then collect the support bundles from both the nodes and Grid master. Below are the CLI commands to enable db_dump and db_queue. * ''set maintenancemode'' * ''set debug_tools db_xml_dump'' * ''set debug_tools db_queue_dump'' Once we collect the above information, we need to remove the db_dump and db_queue from both the nodes and Grid Master using the below CLI commands. * ''set maintenancemode'' * ''set debug_tools db_remove_dump db_xml_dump'' * ''set debug_tools db_remove_dump db_queue_dump'' ===== Clean Old Auto-Generated Records ===== * [[https://support.infoblox.com/s/article/122|Cannot Delete Auto-Created Resource Records]] * [[https://support.infoblox.com/s/article/1756|Cannot Delete Auto-Created PTR Record]] ===== Test MTU with Ping ===== ping from packetsize 1450 [[https://support.infoblox.com/s/article/493|Support Article]] First ping with a packet size of 1450 and if it fails, keep reducing the packet size until the ping is successful. The packet size used when the ping is successful is the effective MTU of the network between the master and member. If the MTU is very low, check the network (Routers/Firewalls) settings to verify why MTU is low. Infoblox Grid does not support a network with effective MTU less than 600. This can be the resolution for when Member showing “Connecting” status in Grid. The logs from member and master shows errors regarding open VPN failed to establish (eg. Connection refused (code=111)). ===== ping ===== ping [dst IP] from [local host IP] ===== Restart WebUI on GM ===== Restart Web UI on Grid Manager. Infoblox > set maintenancemode on Maintenance Mode > debug webui restart Infoblox > set maintenancemode off ===== Huge Page Files ===== To check for huge page files, get a support bundle from the specific node. Extract is and move to the var/log folder where you'll see the 'ptop' logs. grep out the "MEM" lines. To calculate the memory used by huge pages look at the MEM lines in the ptop files. The number of huge pages is number following the h. The size of each huge page is 2MB so the number of huge pages times 2MB will give you the amount of system memory consumed by the huge pages. The best way to calculate the amount of memory used by the RPZ zones you get the specific RPZ zones configured in the named.conf file and then look in the CSP to determine the number of records in each of those zones. The rule of thumb for the amount of memory used by an RPZ record is 1KB per record. Multiplying the number of RPZ records by 1KB will give you the amount of memory used by the records. ===== CPU Monitor ===== show cpu 5 10 show connections numerical set maintenancemode show process all set maintenancemode off ===== DB Queue Dump Data ===== Below are the steps to get DB Queue Dump Data on each appliance. Do not run this unless told to do so by support. - Access CLI - Execute "set maintenancemode on" - Execute "set txn_trace on" - Wait for 10 minutes - Execute "set debug_tools db_queue_dump" - Wait until the command is complete, it may take couple of minutes till you see the cursor again. - Execute "set txn_trace off" - Wait until the command is complete, it may take couple of minutes (or longer) till you see the cursor again. - Execute "set maintenancemode off" - Download support bundle. Please follow the instructions below to collect the requested data. Enabling the CLI command will only generate additional logs and is not expected to impact your environment. If you have any follow-up queries, feel free to reach out. ===== EA Bug in NIOS <9.0.7 ====== To verify if any object data is missing, you may do an XML database dump on the GM and GMC, download the bundles, and compare the files for mismatched object values. To perform this test, take the following steps: - Login to the CLI of the GM and GMC (active node if in an HA pair) - Run "set maintenancemode" and then run "set debug_tools db_xml_dump" on both GM and GMC - Once complete on both the GM and GMC exit the CLI and download a Support Bundle for each - Uncompress the Support Bundle file and locate the onedb.xml file inside the /storage/debug_db_xml directory - Compare the entries in the two DB files and note the objects containing ".com.infoblox.one.extensible_attributes_value" (see OS-specific examples below) or ".com.infoblox.one.hier_rule" ===== Core Dump Files ===== Expert Mode > set maintenancemode Maintenance Mode > show cores Maintenance Mode > show file Maintenance Mode > show backup [ grid | dtc ] then Maintenance Mode > delete Synopsis: delete [ backup | cores | file ] Description: delete backup : Delete the specified backup file(s) delete cores : Delete the specified core file(s) delete file : Delete the selected file(s) Maintenance Mode> ===== Syslog Log Severity ===== Standard syslog error levels explained (from [[https://signoz.io/guides/syslog-levels/|here]]) ^ Level ^ Severity ^ Keyword ^ Description ^ | 0 | Emergency | emerg | System is unusable | | 1 | Alert | alert | Action must be taken immediately | | 2 | Critical | crit | Critical conditions | | 3 | Error | err| Error conditions | | 4 | Warning | warning| Warning conditions | | 5 | Notice | notice| Normal but significant condition | | 6 | Informational | info| Informational messages | | 7 | Debug | debug | Debug-level messages | ===== Syslog Types ===== ^ Filter Name ^ Server Name ^ Facility ^ | CDISCOVERY | | | | Cisco ISE | | | | Cloud API | | | | Cloud DNS | | | | DHCP | dhcpd | daemon | | Discovery | | | | DNS | named | daemon | | DNS Traffic Control | idns_healthd | kern | | File Distribution | httpd/in.tftpd | daemon | | FTP | | | | HTTP | httpd | daemon | | MS Server | msconnectd | daemon | | NTP | ntpd/ntpdate | daemon | | Outbound API | | | | Subscriber Services | | | | TFTP | in.tftpd | daemon | | Threat Insight | | | | Threat Protection | threat-protect-log | | facility/server * kern/idns_healthd (message contains "monitor") * user/gunicorn (message contains "net_autodiscovery") * user/monitor (message contains "State:") * daemon/pidof (message contains "can't read from") * daemon/systemd (message contains "dpkg" and "Rotate") * user/debug_umount (message contains "umount") * daemon/dbus-daemon (message container "dbus") * daemon/dpkg-db-backup (message has dbpkg-db-backup * kern/kernel (message contains "mounted filesystem") * user/controld (message contains "Distribution Complete"/"Distribution Started") * authpriv/su (message has "rabbitmq") * auth/su (message has "rabbitmq") * daemon/ntpd (message contains "NTP service") * daemon/ntpdate (message contains "ntpdate") * daemon/openvpn-member (message contains "Peer Connection") * auth/sshd (message contains "Server listenting") * authpriv/chpasswd (message contains "pam_unix") * daemon/in.tftpd (message contains "connection refused") ===== Show Logs in CLI ===== show log show log syslog show log syslog /termtosearchfor/ show log audit show log syslog follow show log audit follow show log syslog tail 5 show log audit tail 5 show log debug /onedb/ ===== Show BIND Restart ===== From Support Bundle egrep -i ‘shutting|starting BIND|running’ allmsg 2024-01-30T10:04:19+00:00 daemon dnsmember.fqdn.example named[11680]: info shutting down 2024-01-30T10:04:21+00:00 user dnsmember.fqdn.example monitor[9931]: err Type: DNS, State: Yellow, Event: DNS is still running even though DNS Traffic Control is not functioning properly state change from 32 to 106 2024-01-30T10:04:57+00:00 daemon dnsmember.fqdn.example named[11807]: notice starting BIND 9.11.3-S3 (Supported Preview Version) 2024-01-30T10:04:57+00:00 daemon dnsmember.fqdn.example named[11807]: notice running on Linux x86_64 4.9.58 #1 SMP Mon Jan 31 20:10:08 PST 2022 2024-01-30T10:06:17+00:00 daemon dnsmember.fqdn.example named[11807]: notice running * Note: look for the message "all zones loaded" to see when BIND has fully restarted. ===== Show BIND and ISC Version ===== Show DNS and DHCP software version. Restart the service and look in syslog * Note: look for the message "all zones loaded" to see when BIND has fully restarted. NIOS 9.0.6 * Facility daemon * Level = NOTICE * Server = named[xxx] * Message = ''starting BIND 9.16.23-S1 (Supported Preview Version) '' NIOS 9.0.6 * Facility daemon * Level = INFO * Server = dhcpd[xxx] * Message = ''Internet Systems Consortium DHCP Server 4.3.3-P1'' ===== Show Hotfix Action ===== Introduced in NIOS 8.6.3 show action_to_activate_hotfix Infoblox > show action_to_activate_hotfix testlabappliance.infoblox.local Hotfix generic name : CHF-8.6.3.2-J96205-apply-1701438624 Hotfix time : 01-12-23 13:50:24 UTC Suggested best action to activate : Appliance reboot required Member status : ONLINE Note: This action is to be performed after applying the hotfix, if already done please ignore.