This is an old revision of the document!

Troubleshooting NIOS

When downloading a Support Bundle from a NIOS HA member, the bundle will include a bundle from each member of the HA pair (assuming both nodes are online)

show tech-support

Can't Join Grid

The following is mostly caused by the member trying to join not having an active NIOS licence.

Member type mismatch with Grid Master

RMA Depots

List of Infoblox Depots.

In order to ship the same day, the RMA must be processed by 3:00 pm local time of the depot processing the RMA for shipment.

CLI Modes

set maintenancemode

set maintenancemode on

set maintenancemode off

set expertmode

set expertmode on

set expertmode off

NIOS Hardware Status

Example output of show hardware_status

hostname > show hardware_status 
CPU_TEMP:  29 C
SYS_TEMP: 27 C
POWER:  Power #1 OK TYPE:AC FRU-ID:PWS-606P-1R SN:P606PCL03VV4222
POWER:  Power #2 OK TYPE:AC FRU-ID:PWS-606P-1R SN:P606PCL03VV4333
FAN1:   7400
FAN2:   7600
FAN3:   7500
FAN4:   7500
FAN5:   7600
FAN6:   7500

RAID_ARRAY: OPTIMAL
RAID_DISK1: ONLINE, IB-Type14
RAID_DISK2: ONLINE, IB-Type14
RAID_DISK3: ONLINE, IB-Type14
RAID_DISK4: ONLINE, IB-Type14
RAID_BATTERY: RAID battery OK

hostname > show hardware-type

Member hardware type: IB-2225

hostname  > show version

Version : 8.5.4-419474

SN   : 2205202223333123

Hotfix : N/A

TCPDUMP

From the community site.

set expertmode on

tcpdump -i eth2 (udp port 2114 || udp port 1194) && src <lan1-ip.of.new.device> && host <ip of local box>

eth0 = mgmt
eth1 = LAN
eth2 = HA
eth4 = LAN2

To list interfaces

show interface

To list all interfaces quickly

show interface_mtu

To capture traffic on a server (192.168.11.153) where the client (1921.68.99.74) is accessing TCP-443 on the server.

tcpdump -i eth1 -n '(src 192.168.99.74 and dst 192.168.11.153 and dst port 443) or (src 192.168.11.153 and dst 192.168.99.74 and src port 443)'

Hardware

(Don't try this without support)

RAID controller setup screen says “No Configuration Present !”.
Remove the Power Supplies
Press F2
Import the “Foreign Config”
Removed the power supplies again
At next bootup, NIOS runs fsck and recovers the DB

Downloading Logs

To get logs:

Download support bundle. They current log file and the 9 previous rolled logfiles will be in there.
You can also use the “get_log_files” WAPI call.
You can also download from Administration > Logs > Syslog > Download.
You can also download from Administration > Logs > Syslog > Export (which will honour any filters applied).

Joining Grid

Remember, when you try and join a grid, if the Grid name is wrong, the GM will log an error. If the Grid name is correct but the shared secret is wrong, there will be no log. The new member will just silently fail to join.

Also, if you try and join a Grid and tell the joining member to use the MGMT port, if the Grid configuration hasn't got Grid > Grid Manager > Members > [Member] Network > Advanced > “Enable VPN on MGMT Port” ticked, then the member won't be able to join. If you want to use the MGMT port to connect, you have to tick “Enable VPN on MGMT Port” first in the GM configuration for that member.

Show Port is Open

set maintenacemode
Maintenance Mode > show network_connectivity proto udp x.x.x.x 1194

Traceroute

traceroute -U -s GMCip memberIP-p 2114 -n -w 0.75 -f 30-q 1
traceroute -U -s GMCip memberIP-p 1194 -n -w 0.75 -f 30-q 1

SCP Support Bundle

SSH into appliance and transfer off to local SCP/FTP server

 set transfer_supportbundle [ftp|scp] <server-ip> <user-name> <user-password>[dest <file_name>] [core_files] [current_logs] [rotated_logs]

details

IBAP debug log

Turn this off after use to prevent excessive logging in Grid Master.

Grid Master CLI> set debug ibap on

Grid Master CLI> set debug ibap off

Disk Issues

show disk_usage_sorted config

Expert Mode > set maintenancemode
Maintenance Mode > show cores
Maintenance Mode > show file
Maintenance Mode > show backup  [ grid | dtc ]

then

Maintenance Mode > delete

Synopsis:

  delete [ backup | cores | file ]

Description:

     delete backup          : Delete the specified backup file(s)
     delete cores           : Delete the specified core file(s)
     delete cores all       : Delete all the core dumps
     delete file            : Delete the selected file(s)

Maintenance Mode >

Force HA DB Sync

Force Sync from both nodes of a HA pair of the active node isn't syncing to the passive node

Log in to active node and run set maintenancemode and then set debug_toold db_sync
Log in to the passive node and run the same commands

If still issue persist, we need to enable db_dump and db_queue on both the nodes and Grid master and then collect the support bundles from both the nodes and Grid master.

Below are the CLI commands to enable db_dump and db_queue.

set maintenancemode
set debug_tools db_xml_dump
set debug_tools db_queue_dump

Once we collect the above information, we need to remove the db_dump and db_queue from both the nodes and Grid Master using the below CLI commands.

set maintenancemode
set debug_tools db_remove_dump db_xml_dump
set debug_tools db_remove_dump db_queue_dump

Clean Old Auto-Generated Records

Test MTU with Ping

ping <ip address of the member/master> from <ip address of the interface you need to ping from> packetsize 1450

Support Article First ping with a packet size of 1450 and if it fails, keep reducing the packet size until the ping is successful. The packet size used when the ping is successful is the effective MTU of the network between the master and member. If the MTU is very low, check the network (Routers/Firewalls) settings to verify why MTU is low. Infoblox Grid does not support a network with effective MTU less than 600.

This can be the resolution for when Member showing “Connecting” status in Grid. The logs from member and master shows errors regarding open VPN failed to establish (eg. Connection refused (code=111)).

ping

ping [dst IP] from [local host IP]

Restart WebUI on GM

Restart Web UI on Grid Manager.

Infoblox > set maintenancemode on
Maintenance Mode > debug webui restart
Infoblox > set maintenancemode off

Huge Page Files

To check for huge page files, get a support bundle from the specific node. Extract is and move to the var/log folder where you'll see the 'ptop' logs. grep out the “MEM” lines.

To calculate the memory used by huge pages look at the MEM lines in the ptop files. The number of huge pages is number following the h. The size of each huge page is 2MB so the number of huge pages times 2MB will give you the amount of system memory consumed by the huge pages.

The best way to calculate the amount of memory used by the RPZ zones you get the specific RPZ zones configured in the named.conf file and then look in the CSP to determine the number of records in each of those zones. The rule of thumb for the amount of memory used by an RPZ record is 1KB per record. Multiplying the number of RPZ records by 1KB will give you the amount of memory used by the records.

CPU Monitor

show cpu 5 10

show connections numerical

set maintenancemode

show process all

set maintenancemode off

DB Queue Dump Data

Below are the steps to get DB Queue Dump Data on each appliance:

Access CLI
Execute “set maintenancemode”
Execute “set debug_tools db_queue_dump”
Wait until the command is complete, it may take couple of minutes (or longer) till you see the cursor again.
Execute “set maintenancemode off”
Download support bundle.

Please follow the instructions below to collect the requested data. Enabling the CLI command will only generate additional logs and is not expected to impact your environment. If you have any follow-up queries, feel free to reach out.

Core Dump Files

Expert Mode > set maintenancemode
Maintenance Mode > show cores
Maintenance Mode > show file
Maintenance Mode > show backup  [ grid | dtc ]

then

Maintenance Mode > delete

Synopsis:

  delete [ backup | cores | file ]

Description:

     delete backup          : Delete the specified backup file(s)
     delete cores           : Delete the specified core file(s)
     delete file            : Delete the selected file(s)

Maintenance Mode>

Syslog Log Severity

Standard syslog error levels explained (from here)

Level	Severity	Keyword	Description
0	Emergency	emerg	System is unusable
1	Alert	alert	Action must be taken immediately
2	Critical	crit	Critical conditions
3	Error	err	Error conditions
4	Warning	warning	Warning conditions
5	Notice	notice	Normal but significant condition
6	Informational	info	Informational messages
7	Debug	debug	Debug-level messages

Syslog Types

Filter Name	Server Name	Facility
CDISCOVERY
Cisco ISE
Cloud API
Cloud DNS
DHCP	dhcpd	daemon
Discovery
DNS	named	daemon
DNS Traffic Control	idns_healthd	kern
File Distribution	httpd/in.tftpd	daemon
FTP
HTTP	httpd	daemon
MS Server	msconnectd	daemon
NTP	ntpd/ntpdate	daemon
Outbound API
Subscriber Services
TFTP	in.tftpd	daemon
Threat Insight
Threat Protection	threat-protect-log

Show Logs in CLI

show log
show log syslog
show log syslog /termtosearchfor/
show log audit
show log syslog follow
show log audit follow
show log syslog tail 5
show log audit tail 5
show log debug /onedb/

Show BIND Restart

From Support Bundle

egrep -i ‘shutting|starting BIND|running’ allmsg

2024-01-30T10:04:19+00:00 daemon dnsmember.fqdn.example named[11680]: info shutting down
2024-01-30T10:04:21+00:00 user dnsmember.fqdn.example monitor[9931]: err Type: DNS, State: Yellow, Event: DNS is still running even though DNS Traffic Control is not functioning properly state change from 32 to 106
2024-01-30T10:04:57+00:00 daemon dnsmember.fqdn.example named[11807]: notice starting BIND 9.11.3-S3 (Supported Preview Version) <id:20aa1bc>
2024-01-30T10:04:57+00:00 daemon dnsmember.fqdn.example named[11807]: notice running on Linux x86_64 4.9.58 #1 SMP Mon Jan 31 20:10:08 PST 2022
2024-01-30T10:06:17+00:00 daemon dnsmember.fqdn.example named[11807]: notice running

Note: look for the message “all zones loaded” to see when BIND has fully restarted.

Show BIND and ISC Version

Show DNS and DHCP software version. Restart the service and look in syslog

Note: look for the message “all zones loaded” to see when BIND has fully restarted.

NIOS 9.0.6

Facility daemon
Level = NOTICE
Server = named[xxx]
Message = starting BIND 9.16.23-S1 (Supported Preview Version) <id:70b08b2>