sloppy performance lessons but why is full blown on one and sloppy on the other side even slower? theory: instruction cache trashing. lesson: pedro-sized code helps needless to say, my autosloppy code went directly to /dev/null