Today we met issue with Sun ZFS storage 7320. NFS shares provisioned from the ZFS appliance were not responding to requests, even a "df -h" will stuck there for a long long time. And when we checked from ZFS storage side, we found the following statistics:
And during our checking for the traffic source, the ZFS appliance backed to normal by itself:
As we just configured LACP on this ZFS appliance the day before, so we doubted the issue was caused by incorrect network configuration. Here's the network config:
For "Policy", we should match with switch setup to even balance incoming/outgoing data flow. Otherwise, we might experience uneven load balance. Our switch was set to L3, so L3 should be ok. We'll get better load spreading if the policy is L3+4 if the switch supports it. With L3, all connections from any one IP will only use a single member of the aggregation. With L3+4, it will load spread by UDP or TCP port too. More is here.
For "Mode", it should be set according to switch. If the switch is "passive" mode then server/storage needs to be on "active" mode, and vice versa.
For "Timer", it's regarding how often to check LACP status.
After checking switch setting, we found that the switch is in "Active" mode, and as ZFS appliance was also on "Active" mode, so that's the culprit. So we changed the setting to the following: