Troubleshooting Nodes
This guide provides a structured approach to diagnosing and resolving common issues with n2x-node
agents after installation.
Checking n2x-node Status
To verify that the n2x-node
agent is running, check the service status using the appropriate command for your operating system.
Use this command to check the service status:
sudo systemctl status n2x-node
Restart the service using PowerShell:
get-service n2x-node
Use this command to check the service status:
launchctl print system/io.n2x.n2x-node
Restarting the n2x-node Service
If the n2x-node
service is not running correctly, restarting it may resolve the issue.
Restart the service with following command:
sudo systemctl restart n2x-node
-
Restart the service using PowerShell:
restart-service n2x-node
-
Alternatively, use the Services application:
- Open Run (
Win + R
), typeservices.msc
, and press Enter. - Find
n2x-node
, right-click it, and select Restart.
- Open Run (
Restart the service with following command:
sudo launchctl kickstart -k system/io.n2x.n2x-node
Checking Node Connectivity to the Control Plane
All nodes must be connected to the control plane. You can verify this through the WebUI or CLI.
- Navigate to the
Nodes
section in the left-hand menu. -
Find your node and check its online status.
Run the following command to check node details via n2xctl
:
n2xctl node show
You'll be prompted to select the Tenant
and Node
to display details. Example output:
$ n2xctl node show
n2xctl v0.0.3-20240725171430+88c4863--go1.22.5
n2xctl is a CLI to control the n2x SASE platform.
Find more information at https://n2x.io/docs
Β» Tenant: [demo] Demo Tenant
Β» Node: [client-b] Client B
βββββ Node Details β‘
ββββββββββββββββ
Node Information
ββββββββββββββββ
Tenant ID xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
Network ID net-10-254
Subnet ID subnet-10-254-1
Node ID xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
Node Name client-b
Description Client B
Status [online]
...
Reviewing n2x-node Logs
Logs help diagnose connection and performance issues. Use the following commands to view logs based on your environment.
On Linux, use this command to access logs:
sudo journalctl -u n2x-node -o cat -f
-
On macOS, logs are stored in:
/usr/local/var/log/io.n2x.n2x-node.out.log
-
Use this command to view the last 40 lines:
tail -40f /usr/local/var/log/io.n2x.n2x-node.out.log
-
On Windows, the logs are located in:
C:\Program Files\n2x\n2x-node.log
-
To view logs, open the file or use Event Viewer with a custom filter:
-
You can start
n2x-node
in the foreground for real-time debugging, run:cd "C:\Program Files\n2x" .\n2x-node.exe start
This starts the service in the foreground, showing live logs directly in the terminal.
View logs for the sidecar container:
kubectl -n <NAMESPACE> logs -f <POD> -c <WORKLOAD_NAME>-n2x-node
Info
Replace: <NAMESPACE>
, <POD>
, and <WORKLOAD_NAME>
with your specific values.
View logs for the subnet Kubernetes gateway pod:
kubectl -n n2x logs -f <XGW_SUBNET_POD>
Info
Replace: <XGW_SUBNET_POD>
with the correct value.
Identifying Direct vs. Indirect Connections in Logs
To determine whether a node is using a direct or indirect connection, check the n2x-node
logs for Connection Type indicators.
Direct Connection
A direct connection means that nodes communicate directly over TCP/UDP port 57775
without needing a relay.
Look for Connection Type: DIRECT
in the logs:
Dec 10 12:45:38 aws-n2x-node-01 n2x-node[2054]: [ info] 2024-12-10 12:45:38.017 Peer <peer.ID Qm*wtZ6Yc> CONNECTED (1 conns)
Dec 10 12:45:38 aws-n2x-node-01 n2x-node[2054]: [ info] 2024-12-10 12:45:38.019 ----------------------------------------------
Dec 10 12:45:38 aws-n2x-node-01 n2x-node[2054]: [ info] 2024-12-10 12:45:38.020 New INBOUND Connection: QmPkUUgGjy-4
Dec 10 12:45:38 aws-n2x-node-01 n2x-node[2054]: [ info] 2024-12-10 12:45:38.020 Connection Type: DIRECT <----
...
Dec 10 12:45:38 aws-n2x-node-01 n2x-node[2054]: [ info] 2024-12-10 12:45:38.020 ----------------------------------------------
Dec 10 12:45:38 aws-n2x-node-01 n2x-node[2054]: [ info] 2024-12-10 12:45:38.022 Tunnel connected to 10.254.1.99
This indicates that a direct tunnel has been established between the nodes.
Indirect Connection
If a direct connection is not possible due to firewall restrictions or NAT issues, n2x-node
automatically falls back to an indirect connection via a relays.
Look for Connection Type: INDIRECT
in the logs:
Dec 10 12:52:01 aws-n2x-node-01 n2x-node[2174]: [ warn] 2024-12-10 12:52:01.117 Unable to connect to peer internally, trying via default routers...
Dec 10 12:52:02 aws-n2x-node-01 n2x-node[2174]: [ info] 2024-12-10 12:52:02.007 Peer <peer.ID Qm*wtZ6Yc> CONNECTED (1 conns)
Dec 10 12:52:02 aws-n2x-node-01 n2x-node[2174]: [ info] 2024-12-10 12:52:02.007 ----------------------------------------------
Dec 10 12:52:02 aws-n2x-node-01 n2x-node[2174]: [ info] 2024-12-10 12:52:02.007 New OUTBOUND Connection: QmPkUUgGjy-4
Dec 10 12:52:02 aws-n2x-node-01 n2x-node[2174]: [ info] 2024-12-10 12:52:02.007 Connection Type: INDIRECT <----
....
Dec 10 12:52:02 aws-n2x-node-01 n2x-node[2174]: [ info] 2024-12-10 12:52:02.007 ----------------------------------------------
Dec 10 12:52:02 aws-n2x-node-01 n2x-node[2174]: [ info] 2024-12-10 12:52:02.007 Tunnel connected to 10.254.1.99
This means the node is relaying traffic through an intermediary to reach the destination.
Tip
If you see Connection Type: INDIRECT
, check firewall rules and NAT settings to allow direct connections where possible.
Changing n2x-node Logging Level
To adjust the verbosity of n2x-node
logs, follow these steps:
-
Locate the Configuration File:
- Linux/macOS:
/etc/n2x/n2x-node.yml
- Windows:
C:\Windows\Program Files\n2x\n2x-node.yml
- Linux/macOS:
-
Modify the Logging Level:
- Open the file and change the
loglevel
value to one of the following:WARN
β Minimal loggingINFO
β General system activityDEBUG
β Detailed debugging informationTRACE
β Extensive low-level debugging
- Open the file and change the
-
Apply the changes by restarting the n2x-node service.
Note
Higher logging levels generate more detailed logs but may impact performance. Choose the level based on your troubleshooting needs.
Resolving Database Corruption Issues on Windows
Issue Analysis
The following error in the log indicates that the nodeβs metrics database is corrupted, likely due to concurrent access from another process:
panic: runtime error: index out of range [3] with length 0 [recovered]
panic:
== Recovering from initIndex crash ==
File Info: [ID: 116, Size: 388415, Zeros: 388415]
isEncrypted: false checksumLen: 0 checksum: indexLen: 0 index: []
== Recovered ==
goroutine 243 [running]:
github.com/dgraph-io/badger/v4/table.(*Table).initBiggestAndSmallest.func1.1()
github.com/dgraph-io/badger/[email protected]/table/table.go:353 +0x9c
github.com/dgraph-io/badger/v4/table.(*Table).initBiggestAndSmallest.func1()
github.com/dgraph-io/badger/[email protected]/table/table.go:399 +0xab
panic({0x4fc4f20?, 0xc0016b4ed0?})
runtime/panic.go:770 +0x132
This error suggests that the badger database used for storing node metrics encountered corruption.
It has been observed that some Endpoint Detection and Response (EDR) or Extended Detection and Response (XDR) solutions may interfere with the database by performing real-time scans, leading to corruption.
Some security solutions that perform real-time file scanning and may need exclusions include:
- Microsoft Defender for Endpoint (MDE)
- CrowdStrike Falcon
- SentinelOne
- Sophos
- Trend Micro Apex One
- McAfee Endpoint Security (ENS/XDR)
- Carbon Black CB Defense
- Symantec Endpoint Protection (SEP)
Solution
Follow these steps to resolve the issue:
-
Exclude the following directories from real-time scanning in your EDR/XDR solution:
C:\Program Files\n2x\db
C:\Program Files\n2x\cache
-
Stop the
n2x-node
service on the Windows device:stop-service n2x-node
-
Delete the corrupted directories:
Remove-Item -Recurse -Force "C:\Program Files\n2x\db" Remove-Item -Recurse -Force "C:\Program Files\n2x\cache"
-
Start the
n2x-node
service:start-service n2x-node
After completing these steps, the node should start normally without database corruption errors. If the issue persists, check the logs for further details and ensure the exclusions are correctly applied in your security solution.