Troubleshooting network performance issues
Lately,
I've come across with network performance issues in some data
centers, which is usually a head breaker for networking engineers
because when you see the bandwidth is
enough but the throughput reached isn't what you expected, something
is wrong. This is the time when solid
networking knowledge is needed for the troubleshooting process and
concepts like checksum,
Frame Check Sequence (FCS) or overruns are required to analyse
network performance issues and fix them.
Obviously,
we can also have performance issues due to the fact that applications
and services aren't configured properly or they've had a poor
development process but I would like to highlight in this post what
we can check with regard to networking.
We
should look at networking interfaces and looking for the next
attributes:
- Errors: This is the first thing we should look for because it counts when there are CRC errors, or we have frames too-short or too-long (CRC, checksum mismatch).
- Dropped: It counts when interfaces receive unintended VLAN tags or are receiving IPv6 frames when it isn't configured for IPv6.
- Overruns: This is another important attribute to look for because it counts when buffer FIFO gets full and the kernel isn't able to empty it. For example, if the network interface has a buffer of X bytes and it is filled and was exceeded before the buffer could be emptied, then we have overruns.
- Frame: It counts only when there are misaligned frames, it means frames with a length not divisible by 8. Therefore, that length isn't a valid frame and it is discarded. For instance, packets are going to fail if they are not ended on a byte boundary.
- Carrier: When we have loss of link pulse, it counts. Sometimes is recreated by removing and installing the Ethernet cable. Therefore, if this counter is high, the link is flapping (up and down), the Ethernet chip is having issues or the device at the other end of the cable is having issues.
- Collisions: This is another typical issue when we can't reach a good performance. Collisions may count when an interface is running as half duplex and the other end is running as full duplex. Therefore, the half duplex interface is detecting TX and RX packets at the same time and the half duplex device will terminate transmission. As a result, there are collisions, mismatch duplex, and we get very bad throughput. It is important to remember that switched environments always operate as full duplex and collision detection is disabled by default.
Next,
we can see a mismatch duplex laboratory where Fa 0/1 of ASW1 is
working as full duplex and it has FCS-Errors, which means “Frames
with valid size with Frame Check Sequence (FCS) errors but no framing
errors”. Consequently, throughput between PC1 and SRV1 is too bad.
And
we can also see that Fa 0/1 of CSW1 is working as half duplex and it
has Late-Collision, which means “Number of times that a collision
is detected on a particular
port late in the transmission process”.
This is a big clue to realise that we have a duplex mismatch which
should be fixed to have a good networking performance.
This
post is being too long, I'm sorry, but I would like to leave some
Linux commands as well like ethtool
-S eth0
, netstat
-s
, netstat
-i for troubleshooting network
performance:
Regards
my friends and
remember, sometimes we have to go down to the physical layer to fix
networking performance issues.
ifconfig and netstat are obsolete. Please use ip and ss instead. Thanks!
RépondreSupprimer