Perform Packet Capture on VMware ESXi Host for NSX Troubleshooting

VMware offers a great and powerful tool pktcap-uw to perform packet capture on ESXi host.

Pktcap-uw offers a lot of options for packet capture.

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2051814

Here I show most common used in my daily life here for your reference. I normally perform a packet based on vSwitch port ID or DV filter (NSX DFW)

To do that, I firstly need to find the vSwitch port ID and DV filter ID on ESXi host so that I can refer them in your packet capture. I normally use “summarize-dvfilter” CLI to find the requested information.

[root@esx4005:/tmp] summarize-dvfilter | grep -C 10 1314
slowPathID: none
filter source: Dynamic Filter Creation
vNic slot 1
name: nic-18417802-eth0-dvfilter-generic-vmware-swsec.1
agentName: dvfilter-generic-vmware-swsec
state: IOChain Attached
vmState: Detached
failurePolicy: failClosed
slowPathID: none
filter source: Alternate Opaque Channel
world 18444553 vmm0:auslslnxsd1314-113585a5-f6ed-4eb3-abd2-12083901e942 vcUuid:’11 35 85 a5 f6 ed 4e b3-ab d2 12 08 39 01 e9 42′
port 33554558 (vSwitch PortID) auslslnxsd1314-113585a5-f6ed-4eb3-abd2-12083901e942.eth0
vNic slot 2
name: nic-18444553-eth0-vmware-sfw.2 (DV Filter ID)
agentName: vmware-sfw
state: IOChain Attached
vmState: Detached
failurePolicy: failClosed
slowPathID: none
filter source: Dynamic Filter Creation
vNic slot 1
name: nic-18444553-eth0-dvfilter-generic-vmware-swsec.1

 

After I have the vSwitch port ID and DV filter ID, I can start my packet capture.

  • Packet capture to a VM based on vSwitch PortID

pktcap-uw –switchport 33554558 —dir 0 -o /tmp/from1314.pcap

  • Packet capture from a VM based on vSwitch PortID

pktcap-uw –switchport 33554558 —dir 1 -o /tmp/to1314.pcap

  • Packet capture from a VM based on DV filter

pktcap-uw –capture PreDVFilter –dvfilter nic-18444553-eth0-vmware-sfw.2 -o /tmp/1314v3.pcap

Below is a brief explanation of the parameters which we use in the above.

-o (output): save the capture as a packet capture file;

-dir (direction): 0 for traffic to VM and 1 for traffic from VM;

-PreDVFilter: perform packet capture before DFW rules are applied;

-PostDVFilter: perform packet capture after DFW rules are applied;

In addition, you can add filter as well for your capture:

pktcap-uw –switchport 33554558 –tcpport 9000 –dir 1 -o /tmp/from1314.pcap

I list all available filter options here for your reference:

–srcmac
The Ethernet source MAC address.
–dstmac
The Ethernet destination MAC address.
–mac
The Ethernet MAC address(src or dst).
–ethtype
The Ethernet type. HEX format.
–vlan
The Ethernet VLAN ID.
–srcip
The source IP address.
–dstip
The destination IP address.
–ip
The IP address(src or dst).
–proto 0x
The IP protocol.
–srcport
The TCP source port.
–dstport
The TCP destination port.
–tcpport
The TCP port(src or dst).
–vxlan
The vxlan id of flow.

Update:

Start 2 capture at the same time:

pktcap-uw –switchport 50331665 -o /tmp/50331665.pcap & pktcap-uw –uplink vmnic2 -o /tmp/vmnic2.pcap &

Stop all packet capture:

kill $(lsof |grep pktcap-uw |awk ‘{print $1}’| sort -u)

Of course, you can perform some basic packet capture in NSX manager via Central CLI. If you are interested in, please refer my another blog:

https://davidwzhang.com/2017/01/07/limitation-of-nsx-central-cli-packet-capture/

Using TShark Filter for Packet Capture on Vyatta 5600

Vyatta 5600 provides Tshark as the packet capture tool. To capture your interested traffic and remove unnessary nosiy traffic, you need to use the capture filter when you perform the packet capture. Here I show you a few real world example for tshark capture filter, which hope can save you a bit of time.

  • Capture packet based on source or destination IP

tshark -f “host 10.42.131.120” -i dp0p224p1 -w /tmp/capture.pcap

  • Capture packets based on Protocol/Port

tshark -f “tcp port 1401” -i  dp0p224p1 -w /tmp/capture.pcap

tshark -f “udp port 53” -i  dp0p224p1 -w /tmp/capture.pcap

  • Capture packets based on IP and Protocol/Port

tshark -f “tcp port 1401 and host 10.15.72.34” -i  dp0p224p1 -w /tmp/capture.pcap

  • Capture packets based on multilpe IPs and Protocol/Port

tshark -f “tcp port 1401 and host 10.15.72.34 or host 10.15.72.36” -i  dp0p224p1 -w /tmp/capture.pcap

You can use tshark to read your packet capture:

  • tshark -r capture.pcap

Note1: dp0p224p1 is the interface on which we capture the traffic.

Note2: In some cases (GRE tunnel traffic, VXLAN traffic), the above filter possibly won’t really work for you as the filter can only apply the source/destination of tunnel IP.

Another way to control the size of capture file is stopping the packet capture when captures a specfici number of the packet.

  • Capture 50000 Packets and save them to a trace file called 1000test.pcap

tshark -c 50000  -i dp0p192p1 -w /tmp/1000test.pcap

or

tshark -f “host 10.42.131.120”  -c 50000  -i dp0p192p1 -w /tmp/1000test.pcap

NSX Edge Packet Capture on Multi-vNics simultaneously

In NSX 6.1.4, I tried to perform packet capture to analysis the end to end connectivity restoration during Edge HA failover. But I only can capture packet for a single vNic at one time. Somebody may say this can be worked around by performing another packet capture on another vNIC in ESXi hosts by use of “pktcap-uw”. However,”pktcap-uw” can only capture uni-directional traffic in ESXi hosts. This behavior will bring extra challenge for packet analysis.

Luckily in the new version of NSX 6.2.4, it looks like that we can capture on different vNIC at the same time by run multiple times of “debug packet capture interface vNIC” like the below:

debug packet capture interface vNIC_2
debug packet capture interface vNIC_3

nsx-edge You can see that I successfully captured the packet on vNic_2 and vNic_3.
Then you can upload the packet capture to your SFTP server for further analysis by CLI:

debug copy scp user@url:path file-name/all

2017-03-23_090111

When you perform the packet capture, you can use filter to only capture the traffic which you are interested in.

debug packet display interface vNic_0 host_192.168.11.3_and_host_192.168.11.41
debug packet capture interface vNic_0 host_192.168.1.2_and_host_192.168.2.2_and_port_80

Packet Analysis for Troubleshooting-SSH server slow response

Symptom: customer complains about slow response to SSH server running on one Centos box

Method: perform packet capture on the SSH server.

Finding: DNS query fails during establishing SSH session

When folllow the TCP session for SSH login packet caoture, see the below:

ssh1

During packet 17 and 24, there is about 10 seconds gap.

Go back to the whole packet capture, find the below between packet 17 and 24. We can see multiple DNS query but no response

ssh2

After checking the Linux/Centos doc, we found that SSH server by default will check the DNS for the source IP of ssh client before the SSH session can be established. The DNS query failure introuduces the 10seconds delay before the SSH server responses to the client

Temp fix: disbale DNS query in the sshd_config.

UseDNS no

Long term fix: fix the DNS query issue.

Packet Analysis for Troubleshooting-Slow response of AD home directory

Symptom: virtual desktop end users complain the performance issue: the end users can access their AD home directory quickly at the first time. After a little while, they have to wait for over 30 seconds before they can reach their home directory.

Method: perform packet capture on one of end users and successfully capture the packet when the user is experiencing the issue.

Finding in the packet analysis:

TCP_retransmission

Root Cause: By default, the timeout setting of session entry in firewall session table for most of stateful firewalls are 30 mins. If there is not any packet passing through the firewall for that session, the session will be timed out and removed from the session table by the firewall.

In our case, a new TCP session entries will be established in the firewall session table when the virtual desktop users try to access the home directory at the first time. Then the end users often doesn’t use the home directory any more. After half hour, the idle session entry will be removed from the firewall. But from end user application point of view, the session is still alive and they try to use this alive session to access their home directory again. (Remember the user desktop won’t perform so called a three-way handshake to establish a new TCP session as the application layer still think the TCP session is still alive). When the application traffic hits the firewall. the firewall dropped the packet as the application traffic is not TCP SYN packet. (Unfortunately, the Juniper SRX firewall drops the packets silently!!! No logging or alert). So the end user desktop has to follow up the standard TCP re-transimission mechanism to re-transmit the packet. It takes 12 seconds in our case before the end user device gives up and try to initiate a new TCP session.

Solution:

We have 2 ways to fix the issue:

1. Infrastructure point of view:

Change the default session timeout setting on the firewall to a bit of bigger than the application layer session timeout;

2. Application point of view:

Make the application to periodically (e.g. 20mins) send TCP keep-alive packet before the session entry is removed from the firewall session table;

Both of the above fixes will bring a bit of overhead on the firewall, especially from session table size point of view.

Read the Citrix nstrace packet capture by wireshark

The NetScaler has two separate mechanisms available to capture the network traffic through the appliance: nstrace.sh and nstcpdump.sh. NStrace records network packets trace in the native NetScaler trace format, which provides specific NIC device information including device number and whether the packet was transmitted or received. However, the current stable version of wireshark can’t read the packet capture.

After I did a bit of research, I found the development version of Wireshark can open nstrace packet capture properly. Below shows the wireshark developement version which i use to open the standard nstrace packet capture.

Nstrace1

In addition, nstrace CLI do provide the option to perform a standard tcpdump packet capture. The captured packets can be read by wireshark stable release.

nstrace -filter “ip==10.1.1.98 || ip==10.1.1.218” -size 0 -tcpdump enabled