Performance on modern, recent amd64, current Xeon, pure forwarding: cannot measure benefits not even between software engine and offloading to hardware analyzing profiling data: current Xeons so much raw computing power, big and fast caches other factors becoming the limiting ones latency, last not least to the devices effects are completely hidden for these microbenchmarks different story if doing more than pure forwarding older system (Pentium M, em(4)) seeing about 7% increase