Skip to content

WireGuard outbound does not recover after a transient network outage (stale UDP socket; endpoint never re-resolved) #6257

@almatv54

Description

@almatv54

Integrity requirements

  • I have read all the comments in the issue template and ensured that this issue meet the requirements.
  • I confirm that I have read the documentation, understand the meaning of all the configuration items I wrote, and did not pile up seemingly useful options or default values.
  • I provided the complete config and logs, rather than just providing the truncated parts based on my own judgment.
  • I searched issues and did not find any similar issues.
  • The problem can be successfully reproduced in the latest Release

Description

A WireGuard outbound whose peer endpoint is given as a domain name stops working after a short loss of upstream network connectivity on the host, and never recovers on its own — only restarting the Xray process brings it back.

Scenario: the host had a brief network interruption (the machine stayed up, power was fine, only the uplink was gone for several minutes). After connectivity was restored, Xray itself was clearly still running and healthy — other inbounds kept accepting connections and responding normally —but everything routed through the WireGuard outbound stayed dead indefinitely.

Reading the source, two things in proxy/wireguard seem to combine to prevent recovery:

  1. Re-dial only triggers on a read error. In bind.go, netBindClient.Send only re-dials when endpoint.conn == nil, and conn is reset to nil only inside the receive goroutine when ReadFrom returns an error (connectTo). On a write error, Send returns the error but does not reset endpoint.conn, so no re-dial happens. A connected UDP socket typically does not produce a read error when the peer silently disappears during an outage (no ICMP), so the receive goroutine just blocks, conn stays non-nil, writes keep going into a dead socket, and the outbound never re-establishes.

  2. The endpoint is resolved once and cached. createIPCRequest in client.go resolves the endpoint domain to an IP a single time and never re-resolves it for the lifetime of the device, so even a re-dial would reuse a possibly-stale address.

Suggested directions: on a write error in Send, also reset/close endpoint.conn so the next Send re-dials; and retain the endpoint domain to re-resolve it on re-dial.

Reproduction Method

  1. Start Xray with the config below (a local SOCKS inbound routed to a WireGuard outbound whose peer endpoint is a domain name).
  2. Confirm traffic works: curl -x socks5h://127.0.0.1:1080 https://api.ipify.org.
  3. Cut the host's network for ~3-5 minutes (e.g. drop all egress with the firewall, or bring the uplink down), then restore it.
  4. Run the same curl again. It hangs/fails and never recovers; only restarting Xray restores the WireGuard outbound.

Client config

Details

No separate Xray client is needed to reproduce. Traffic is generated against the local SOCKS inbound of the single instance below, e.g.:

curl -x socks5h://127.0.0.1:1080 https://api.ipify.org

Server config

Details

{
  "log": {
    "loglevel": "debug",
    "dnsLog": true
  },
  "inbounds": [
    {
      "tag": "socks-in",
      "protocol": "socks",
      "listen": "127.0.0.1",
      "port": 1080,
      "settings": {
        "udp": true
      }
    }
  ],
  "outbounds": [
    {
      "protocol": "wireguard",
      "tag": "wg-out",
      "settings": {
        "secretKey": "SECRETKEY",
        "address": [
          "10.0.0.2/32"
        ],
        "peers": [
          {
            "publicKey": "PUBLICKEY",
            "endpoint": "wg.example.com:51820",
            "allowedIPs": [
              "0.0.0.0/0",
              "::/0"
            ]
          }
        ],
        "domainStrategy": "ForceIPv4"
      }
    }
  ],
  "routing": {
    "rules": [
      {
        "inboundTag": [
          "socks-in"
        ],
        "outboundTag": "wg-out"
      }
    ]
  }
}

Client log

Details

Not applicable — reproduction uses only the local SOCKS inbound of the single instance above; there is no separate Xray client.

Server log

Details

Debug logs were not retained from the original incident.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions