This was a post on the forums. I turned it into a document in hopes it will help someone someday.
This is a new implementation.
So I currently have 2 UAG's deployed. Version 3.1 and 3.2 currently deployed to Production
They are behind a NetScaler load balancer.
So after a few days the UAG's stop accepting connections on 443. I have to reboot these every night or the problem happens 100% of the time. At the moment I'm keeping 1 disabled on standby in case the other breaks during the workday. When these break, port 4172 remains open so any existing connections remain. It's only new connection attempts that fail.
I have an open case with VMWare but they've turned us over to Citrix support.I wish they would actually want to know what is causing this, since obviously something is breaking their UAG. This is a passive aggressive remark in case your reading VMWare.
We have 50 users. Yet see hundreds of stale connections on the UAG. We are not being DOS'ed as confirmed by our network team.
Citrix NetScaler Load Balancer: 192.24.16.172
UAG: 192.24.17.184
Citrix NetScaler Load Balancer is configured to perform a healthcheck per the recommended method via VMWare. Using GET /favicon.ico.
On the UAG:
netstat shows hundreds of these close_wait connections:
tcp 1 0 192.24.17.184:6443 192.24.16.172:46864 CLOSE_WAIT
tcp 1 0 192.24.17.184:6443 192.24.16.172:29408 CLOSE_WAIT
tcp 1 0 192.24.17.184:6443 192.24.16.172:65027 CLOSE_WAIT
tcp 1 0 192.24.17.184:6443 192.24.16.172:16839 CLOSE_WAIT
tcp 1 0 192.24.17.184:6443 192.24.16.172:45761 CLOSE_WAIT
tcp 1 0 192.24.17.184:6443 192.24.16.172:44743 CLOSE_WAIT
tcp 1 0 192.24.17.184:6443 192.24.16.172:9926 CLOSE_WAIT
On the UAG:
Hundreds of these in /opt/vmware/gateway/logs/SecurityGateway_blah_blah_
2018-01-14T04:42:45.017+00:00> LVL:error : [C: 192.24.16.172:58952] *** SSIGServer::SSL handshake failure: End of file (2) error:00000002:lib(0):func(0):system lib
2018-01-14T04:42:47.187+00:00> LVL:error : [C: 192.24.16.172:24632] *** SSIGServer::SSL handshake failure: End of file (2) error:00000002:lib(0):func(0):system lib
2018-01-14T04:42:50.017+00:00> LVL:error : [C: 192.24.16.172:39938] *** SSIGServer::SSL handshake failure: End of file (2) error:00000002:lib(0):func(0):system lib
2018-01-14T04:42:52.187+00:00> LVL:error : [C: 192.24.16.172:3371] *** SSIGServer::SSL handshake failure: End of file (2) error:00000002:lib(0):func(0):system lib
2018-01-14T04:42:55.017+00:00> LVL:error : [C: 192.24.16.172:42301] *** SSIGServer::SSL handshake failure: End of file (2) error:00000002:lib(0):func(0):system lib
2018-01-14T04:42:57.188+00:00> LVL:error : [C: 192.24.16.172:47881] *** SSIGServer::SSL handshake failure: End of file (2) error:00000002:lib(0):func(0):system lib
2018-01-14T04:43:00.017+00:00> LVL:error : [C: 192.24.16.172:28791] *** SSIGServer::SSL handshake failure: End of file (2) error:00000002:lib(0):func(0):system lib
SOLUTION:
I greatly dislike when I find a forum post with no answer so I will answer what the final solution to this was.
I had done some digging into the UAG console and noticed the below messages. From what I had gathered the UAG has a built in mechanism that protects itself from DDOS type attacks. Our Citrix Netscaler Load Balancer health check was triggering this mechanism. So essentially the UAG thought the Load Balancer was attacking it so it shut itself down. The DosPreventionHandler kicked in. Port 4172 (PCOIP) remained open, existing users remained connected, but port 443 stopped accepting new connection. When I spoke to VMWare support they confirmed my suspicion.
The workaround is to set the below settings to 0 in the UAG.
This document was generated from the following discussion: UAG breaks after a few days. They break 100% of the time.