Skip to content

Troubleshooting Nodes

This guide provides a structured approach to diagnosing and resolving common issues with n2x-node agents after installation.

Checking n2x-node Status

To verify that the n2x-node agent is running, check the service status using the appropriate command for your operating system.

Use this command to check the service status:

sudo systemctl status n2x-node

Restart the service using PowerShell:

get-service n2x-node

Use this command to check the service status:

launchctl print system/io.n2x.n2x-node

Restarting the n2x-node Service

If the n2x-node service is not running correctly, restarting it may resolve the issue.

Restart the service with following command:

sudo systemctl restart n2x-node
  • Restart the service using PowerShell:

    restart-service n2x-node
    
  • Alternatively, use the Services application:

    1. Open Run (Win + R), type services.msc, and press Enter.
    2. Find n2x-node, right-click it, and select Restart.

Restart the service with following command:

sudo launchctl kickstart -k system/io.n2x.n2x-node

Checking Node Connectivity to the Control Plane

All nodes must be connected to the control plane. You can verify this through the WebUI or CLI.

  1. Navigate to the Nodes section in the left-hand menu.
  2. Find your node and check its online status.

    node online

Run the following command to check node details via n2xctl:

n2xctl node show

You'll be prompted to select the Tenant and Node to display details. Example output:

$ n2xctl node show 
n2xctl v0.0.3-20240725171430+88c4863--go1.22.5

n2xctl is a CLI to control the n2x SASE platform.

Find more information at https://n2x.io/docs

Β» Tenant: [demo] Demo Tenant
Β» Node: [client-b] Client B
                                                          ───── Node Details ≑
════════════════
Node Information
════════════════

Tenant ID   xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx    
Network ID  net-10-254                              
Subnet ID   subnet-10-254-1                         
Node ID     xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx    
Node Name   client-b                                
Description Client B                                
Status      [online]                                
...                       

Reviewing n2x-node Logs

Logs help diagnose connection and performance issues. Use the following commands to view logs based on your environment.

On Linux, use this command to access logs:

sudo journalctl -u n2x-node -o cat -f
  • On macOS, logs are stored in: /usr/local/var/log/io.n2x.n2x-node.out.log

  • Use this command to view the last 40 lines:

    tail -40f /usr/local/var/log/io.n2x.n2x-node.out.log
    
  • On Windows, the logs are located in: C:\Program Files\n2x\n2x-node.log

  • To view logs, open the file or use Event Viewer with a custom filter:

    Event Viewer Filter

  • You can start n2x-node in the foreground for real-time debugging, run:

    cd "C:\Program Files\n2x"
    .\n2x-node.exe start
    

    This starts the service in the foreground, showing live logs directly in the terminal.

View logs for the sidecar container:

kubectl -n <NAMESPACE> logs -f <POD> -c <WORKLOAD_NAME>-n2x-node

Info

Replace: <NAMESPACE>, <POD>, and <WORKLOAD_NAME> with your specific values.

View logs for the subnet Kubernetes gateway pod:

kubectl -n n2x logs -f <XGW_SUBNET_POD>

Info

Replace: <XGW_SUBNET_POD> with the correct value.

Identifying Direct vs. Indirect Connections in Logs

To determine whether a node is using a direct or indirect connection, check the n2x-node logs for Connection Type indicators.

Direct Connection

A direct connection means that nodes communicate directly over TCP/UDP port 57775 without needing a relay.

Look for Connection Type: DIRECT in the logs:

Dec 10 12:45:38 aws-n2x-node-01 n2x-node[2054]: [ info] 2024-12-10 12:45:38.017 Peer <peer.ID Qm*wtZ6Yc> CONNECTED (1 conns)
Dec 10 12:45:38 aws-n2x-node-01 n2x-node[2054]: [ info] 2024-12-10 12:45:38.019 ----------------------------------------------
Dec 10 12:45:38 aws-n2x-node-01 n2x-node[2054]: [ info] 2024-12-10 12:45:38.020 New INBOUND Connection: QmPkUUgGjy-4
Dec 10 12:45:38 aws-n2x-node-01 n2x-node[2054]: [ info] 2024-12-10 12:45:38.020 Connection Type: DIRECT                  <----
...
Dec 10 12:45:38 aws-n2x-node-01 n2x-node[2054]: [ info] 2024-12-10 12:45:38.020 ----------------------------------------------
Dec 10 12:45:38 aws-n2x-node-01 n2x-node[2054]: [ info] 2024-12-10 12:45:38.022 Tunnel connected to 10.254.1.99

This indicates that a direct tunnel has been established between the nodes.

Indirect Connection

If a direct connection is not possible due to firewall restrictions or NAT issues, n2x-node automatically falls back to an indirect connection via a relays.

Look for Connection Type: INDIRECT in the logs:

Dec 10 12:52:01 aws-n2x-node-01 n2x-node[2174]: [ warn] 2024-12-10 12:52:01.117 Unable to connect to peer internally, trying via default routers...
Dec 10 12:52:02 aws-n2x-node-01 n2x-node[2174]: [ info] 2024-12-10 12:52:02.007 Peer <peer.ID Qm*wtZ6Yc> CONNECTED (1 conns)
Dec 10 12:52:02 aws-n2x-node-01 n2x-node[2174]: [ info] 2024-12-10 12:52:02.007 ----------------------------------------------
Dec 10 12:52:02 aws-n2x-node-01 n2x-node[2174]: [ info] 2024-12-10 12:52:02.007 New OUTBOUND Connection: QmPkUUgGjy-4
Dec 10 12:52:02 aws-n2x-node-01 n2x-node[2174]: [ info] 2024-12-10 12:52:02.007 Connection Type: INDIRECT                <----
....
Dec 10 12:52:02 aws-n2x-node-01 n2x-node[2174]: [ info] 2024-12-10 12:52:02.007 ----------------------------------------------
Dec 10 12:52:02 aws-n2x-node-01 n2x-node[2174]: [ info] 2024-12-10 12:52:02.007 Tunnel connected to 10.254.1.99

This means the node is relaying traffic through an intermediary to reach the destination.

Tip

If you see Connection Type: INDIRECT, check firewall rules and NAT settings to allow direct connections where possible.

Changing n2x-node Logging Level

To adjust the verbosity of n2x-node logs, follow these steps:

  1. Locate the Configuration File:

    • Linux/macOS: /etc/n2x/n2x-node.yml
    • Windows: C:\Windows\Program Files\n2x\n2x-node.yml
  2. Modify the Logging Level:

    • Open the file and change the loglevel value to one of the following:
      • WARN β†’ Minimal logging
      • INFO β†’ General system activity
      • DEBUG β†’ Detailed debugging information
      • TRACE β†’ Extensive low-level debugging
  3. Apply the changes by restarting the n2x-node service.

Note

Higher logging levels generate more detailed logs but may impact performance. Choose the level based on your troubleshooting needs.

Resolving Database Corruption Issues on Windows

Issue Analysis

The following error in the log indicates that the node’s metrics database is corrupted, likely due to concurrent access from another process:

panic: runtime error: index out of range [3] with length 0 [recovered]

        panic:

== Recovering from initIndex crash ==

File Info: [ID: 116, Size: 388415, Zeros: 388415]

isEncrypted: false checksumLen: 0 checksum:  indexLen: 0 index: []

== Recovered ==

goroutine 243 [running]:

github.com/dgraph-io/badger/v4/table.(*Table).initBiggestAndSmallest.func1.1()
        github.com/dgraph-io/badger/[email protected]/table/table.go:353 +0x9c
github.com/dgraph-io/badger/v4/table.(*Table).initBiggestAndSmallest.func1()
        github.com/dgraph-io/badger/[email protected]/table/table.go:399 +0xab
panic({0x4fc4f20?, 0xc0016b4ed0?})
        runtime/panic.go:770 +0x132

This error suggests that the badger database used for storing node metrics encountered corruption.

It has been observed that some Endpoint Detection and Response (EDR) or Extended Detection and Response (XDR) solutions may interfere with the database by performing real-time scans, leading to corruption.

Some security solutions that perform real-time file scanning and may need exclusions include:

  • Microsoft Defender for Endpoint (MDE)
  • CrowdStrike Falcon
  • SentinelOne
  • Sophos
  • Trend Micro Apex One
  • McAfee Endpoint Security (ENS/XDR)
  • Carbon Black CB Defense
  • Symantec Endpoint Protection (SEP)

Solution

Follow these steps to resolve the issue:

  1. Exclude the following directories from real-time scanning in your EDR/XDR solution:

    • C:\Program Files\n2x\db
    • C:\Program Files\n2x\cache
  2. Stop the n2x-node service on the Windows device:

    stop-service n2x-node
    
  3. Delete the corrupted directories:

    Remove-Item -Recurse -Force "C:\Program Files\n2x\db"
    Remove-Item -Recurse -Force "C:\Program Files\n2x\cache"
    
  4. Start the n2x-node service:

    start-service n2x-node
    

After completing these steps, the node should start normally without database corruption errors. If the issue persists, check the logs for further details and ensure the exclusions are correctly applied in your security solution.