Home > Networking Security > Resolved Intel e1000e driver bug on 82574L Ethernet controller causing network blipping

Resolved Intel e1000e driver bug on 82574L Ethernet controller causing network blipping

April 1st, 2012


Earlier I posted a question about centos 6.2 lost internet connections intermittently. Now finally I got the right way to fix this.

Firstly, this is a known bug on Intel e1000e driver on linux platforms. This is a driver problem with the Intel 82574L(MSI/MSI-X interrupts issue). The internet connection lost itself now and then and there’s nothing logged about this which is very bad for troubleshooting.
You can see more bug reporting about this at https://bugzilla.redhat.com/show_bug.cgi?id=632650

Fortunately, we can resolve this by install kmod-e1000e package from ELrepo.org. To solve this, you need do as the following(ignore lines with strikeouts):

  • Install kmod-e1000e offered by Elrepo

Import the public key:
rpm –import http://elrepo.org/RPM-GPG-KEY-elrepo.org

To install ELRepo for RHEL-5, SL-5 or CentOS-5:
rpm -Uvh http://elrepo.org/elrepo-release-5-3.el5.elrepo.noarch.rpm

To install ELRepo for RHEL-6, SL-6 or CentOS-6:
rpm -Uvh http://elrepo.org/elrepo-release-6-4.el6.elrepo.noarch.rpm

Before installing the new driver, let’s see our old one:
[root@doxer sites]# lspci |grep -i ethernet
02:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
03:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection

[root@doxer modprobe.d]# lsmod|grep e100
e1000e 219500 0

[root@doxer modprobe.d]# modinfo e1000e
filename: /lib/modules/2.6.32-220.7.1.el6.x86_64/kernel/drivers/net/e1000e/e1000e.ko
version: 1.4.4-k
license: GPL
description: Intel(R) PRO/1000 Network Driver
author: Intel Corporation, <[email protected]>
srcversion: 6BD7BCA22E0864D9C8B756A

Now let’s install the new kmod-e1000e offered by elrepo:
[root@doxer yum.repos.d]# yum list|grep -i e1000
kmod-e1000.x86_64 8.0.35-1.el6.elrepo elrepo
kmod-e1000e.x86_64 1.9.5-1.el6.elrepo elrepo

[root@doxer yum.repos.d]# yum -y install kmod-e1000e.x86_64

After installation, reboot your machine, and you’ll find driver updated:
[root@doxer ~]# modinfo e1000e
filename: /lib/modules/2.6.32-220.7.1.el6.x86_64/weak-updates/e1000e/e1000e.ko
version: 1.9.5-NAPI
license: GPL
description: Intel(R) PRO/1000 Network Driver
author: Intel Corporation, <[email protected]>
srcversion: 16A9E37B9207620F5453F5E

[root@doxer ~]# lsmod|grep e100
e1000e 229197 0

  • change kernel parameter
Append the following parameters to grub.conf kernel line:
pcie_aspm=off e1000e.IntMode=1,1 e1000e.InterruptThrottleRate=10000,10000 acpi=off
  • change NIC parameters(you should add these lines to /etc/rc.local)

#disable pause autonegotiate
/sbin/ethtool -A eth0 autoneg off
/sbin/ethtool -s eth0 autoneg off
#change tx ring buffer
/sbin/ethtool -G eth0 tx 4096 #maybe too large(consider 512). To increase interrupt rate, ethtool -C eth0 rx-usecs 10<10000 interrupts per second>
#change rx ring buffer
/sbin/ethtool -G eth0 rx 128
#disable wake on line
/sbin/ethtool -s eth0 wol d
#turn off offload
/sbin/ethtool -K eth0 tx off rx off sg off tso off gso off gro off
#enable TX pause
/sbin/ethtool -A eth0 tx on
#disable ASPM
/sbin/setpci -s 02:00.0 CAP_EXP+10.b=40
/sbin/setpci -s 00:19.0 CAP_EXP+10.b=40

PS:

  1. pcie_aspm is abbr for Active-State Power Management. This is somehow related to powersaving mechanism, you can get more info here.
  2. acpi is abbr for Advanced Configuration and Power Interface, you can refer to here
  3. apic is abbr for Advanced Programmable Interrupt Controller, it’s somehow related to IRQ<Interrupt Request>. apic is one kind of many PICs, intel and some other NICs have this feature. You can read more info about this here.

Now reboot your machine and you’re expected to have a more steady networking!

PS2:

The reason why there’s so much strikeouts in this article is that I’ve struggled a lot with this kernel bug. Firstly, I thought it’s caused by kernel bug of e1000e driver, and after some searching, I installed kmod-e1000e driver and modified the kernel parameter. Things turned better for a short time. Later, I found the issue was still there, so I tried compile the latest e1000e driver from intel. But neither this worked.

Later, I tried a script which monitored the networking of the time NIC went down. After the NIC failed for several times, I found that Tx traffic was so high each time NIC went to failure(TX bytes went up like 5Gb at a very short time). Based on this, I realized that there may be some DoS attack on the server. Using ntop & tcpdump, I found that DNS traffic was very large, but actually my host was not providing DNS services at all!

Then I wrote some iptable rules to disallow DNS queries etc, and after that, the host now is becoming steady again! Traffic went down as per normal, and everything is now on the track. I’m so happy and so excited about this as this is the first time I’ve stopped an DoS attack!

This problem is due to bug on Intel NICs’ MSI and/or MSI-X interrupts. To solve this, you need download the latest Intel 82574L driver here. After downloading the source tarball to your server, do the following steps as the driver’s README file:

  1. unzip: tar zxf e1000e-x.x.x.tar.gz
  2. cd e1000e-x.x.x/src/
  3. make CFLAGS_EXTRA=-DDISABLE_PCI_MSI install #this step is critical
  4. rmmod e1000e; modprobe e1000e
  5. add e1000e to /etc/modprobe.conf
  6. reboot server
After that, when you check intel e1000e driver module, you should now see:

[root@doxer ~]# modinfo e1000e
filename: /lib/modules/2.6.32-220.7.1.el6.x86_64/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko
version: 1.10.6-NAPI
license: GPL
description: Intel(R) PRO/1000 Network Driver
author: Intel Corporation, <[email protected]>

…..blablabla…..

vermagic:       2.6.32-220.7.1.el6.x86_64 SMP mod_unload modversions

parm: copybreak:Maximum size of packet that is copied to a new buffer on receive (uint)
parm: TxIntDelay:Transmit Interrupt Delay (array of int)
parm: TxAbsIntDelay:Transmit Absolute Interrupt Delay (array of int)
parm: RxIntDelay:Receive Interrupt Delay (array of int)
parm: RxAbsIntDelay:Receive Absolute Interrupt Delay (array of int)
parm: InterruptThrottleRate:Interrupt Throttling Rate (array of int)
parm: IntMode:Interrupt Mode (array of int)
parm: SmartPowerDownEnable:Enable PHY smart power down (array of int)
parm: KumeranLockLoss:Enable Kumeran lock loss workaround (array of int)
parm: CrcStripping:Enable CRC Stripping, disable if your BMC needs the CRC (array of int)
parm: EEE:Enable/disable on parts that support the feature (array of int)
parm: Node:[ROUTING] Node to allocate memory on, default -1 (array of int)

And also, you may need to add pcie_aspm=off to the kernel cmd line in file /boot/grub/menu.lst to disable Active-State Power Management which may cause problems.

That’s all steps to fix Intel e1000e driver bug on 82574L Ethernet controller.

NOTE:Please do not do steps below, it’s proved not able to solve this 82574L driver bug!

Fortunately, we can resolve this by install kmod-e1000e package from ELrepo.org, here’s all steps you need:
Import the public key:
rpm –import http://elrepo.org/RPM-GPG-KEY-elrepo.org

To install ELRepo for RHEL-5, SL-5 or CentOS-5:
rpm -Uvh http://elrepo.org/elrepo-release-5-3.el5.elrepo.noarch.rpm

To install ELRepo for RHEL-6, SL-6 or CentOS-6:
rpm -Uvh http://elrepo.org/elrepo-release-6-4.el6.elrepo.noarch.rpm

Before installing the new driver, let’s see our old one:
[root@doxer sites]# lspci |grep -i ethernet
02:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
03:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection

[root@doxer modprobe.d]# lsmod|grep e100
e1000e 219500 0

[root@doxer modprobe.d]# modinfo e1000e
filename: /lib/modules/2.6.32-220.7.1.el6.x86_64/kernel/drivers/net/e1000e/e1000e.ko
version: 1.4.4-k
license: GPL
description: Intel(R) PRO/1000 Network Driver
author: Intel Corporation, <[email protected]>
srcversion: 6BD7BCA22E0864D9C8B756A

Now let’s install the new kmod-e1000e offered by elrepo:
[root@doxer yum.repos.d]# yum list|grep -i e1000
kmod-e1000.x86_64 8.0.35-1.el6.elrepo elrepo
kmod-e1000e.x86_64 1.9.5-1.el6.elrepo elrepo

[root@doxer yum.repos.d]# yum -y install kmod-e1000e.x86_64

After installation, reboot your machine, and you’ll find driver updated:
[root@doxer ~]# modinfo e1000e
filename: /lib/modules/2.6.32-220.7.1.el6.x86_64/weak-updates/e1000e/e1000e.ko
version: 1.9.5-NAPI
license: GPL
description: Intel(R) PRO/1000 Network Driver
author: Intel Corporation, <[email protected]>
srcversion: 16A9E37B9207620F5453F5E

[root@doxer ~]# lsmod|grep e100
e1000e 229197 0

And also, you may need to add pcie_aspm=off to the kernel cmd line in file /boot/grub/menu.lst to disable Active-State Power Management which may cause problems.

 

 

You should get a better networking on linux now. Enjoy!

PS:

 

Actually, there’re lot of talks over the internet about this problem, then I know it’s not only me who was annoyed by this weird problem!

 

http://www.google.com.hk/search?hl=en&newwindow=1&safe=strict&q=Intel+e1000e+driver+bug&oq=Intel+e1000e+driver+bug&aq=f&aqi=&aql=&gs_l=serp.3…9108l9252l0l9707l2l2l0l0l0l0l0l0ll0l0.frgbld.


  • Pingback: centos 6.2 lost internet connections intermittently

  • Pingback: [SOLVED] Intel 82574L Gigabit network card - issues and resolution

  • Pingback: What is "carrier" from the ifconfig output?

  • khapota

    Thank for your topic. I get the same bug with 82574L driver. One question: Do i need to do two steps “change kernel parameter” and “change NIC parameters(you should add these lines to /etc/rc.local)

    • doxerorg

      Hi,
      yep, I would vote for this. I set for both of them and it’s running as expected.

  • Mark

    Thanks a LOT for taking the time to document this! We ran into this problem on our new squid proxy servers but the problem didnt show up until it was under load (we had run it for months under a light load with no problems). Fortunately, one of the NIC’s failed yesterday (under no load) except with some errors in /var/log/messsages this time (unlike before). After searching a bit on the error messages, I started running across comments by others of similar problems and eventually found this link with detailed instructions on how to fix bug. I went ahead and made the changes and will now decide how much user traffic to send thru to see if the problem re-occurs. Thanks again!!

  • http://www.facebook.com/hmadureira Henrique Madureira

    Hello all, this may be related to something I’m experiencing. I have a Debian Lenny install and the same occurs all the time to me. My server runs a Clonezilla-SE and e1000 always crashes randomly, but never while sending the multicast over the network. Also, nothing appears on the va/log/messages but a “e1000 PCI INT A disabled”. I’m gonna try to re-install the NIC driver and see what I get.

  • http://www.facebook.com/ilya.dorfman.1 Ilya Dorfman

    Given that this fix has been posted on April 1 – could someone confirm that it actually is the right way to fix the problem and that it actually worked. I am having an issue like this too and would like a confirmation from someone…

  • Pingback: Homepage