Symptom: virtual desktop end users complain the performance issue: the end users can access their AD home directory quickly at the first time. After a little while, they have to wait for over 30 seconds before they can reach their home directory.
Method: perform packet capture on one of end users and successfully capture the packet when the user is experiencing the issue.
Finding in the packet analysis:
Root Cause: By default, the timeout setting of session entry in firewall session table for most of stateful firewalls are 30 mins. If there is not any packet passing through the firewall for that session, the session will be timed out and removed from the session table by the firewall.
In our case, a new TCP session entries will be established in the firewall session table when the virtual desktop users try to access the home directory at the first time. Then the end users often doesn’t use the home directory any more. After half hour, the idle session entry will be removed from the firewall. But from end user application point of view, the session is still alive and they try to use this alive session to access their home directory again. (Remember the user desktop won’t perform so called a three-way handshake to establish a new TCP session as the application layer still think the TCP session is still alive). When the application traffic hits the firewall. the firewall dropped the packet as the application traffic is not TCP SYN packet. (Unfortunately, the Juniper SRX firewall drops the packets silently!!! No logging or alert). So the end user desktop has to follow up the standard TCP re-transimission mechanism to re-transmit the packet. It takes 12 seconds in our case before the end user device gives up and try to initiate a new TCP session.
Solution:
We have 2 ways to fix the issue:
1. Infrastructure point of view:
Change the default session timeout setting on the firewall to a bit of bigger than the application layer session timeout;
2. Application point of view:
Make the application to periodically (e.g. 20mins) send TCP keep-alive packet before the session entry is removed from the firewall session table;
Both of the above fixes will bring a bit of overhead on the firewall, especially from session table size point of view.