He stayed until dawn. He wrote a small program—just 200 lines of C—that did nothing but shuffle data through the cache hierarchy. L1 to L2 to L3 to RAM and back. He watched it in the Memory Access analysis of VTune.
And then he saw it.
A cache line that was being evicted for no reason. A ghost. The hardware prefetcher was guessing wrong. The Intel Compiler had missed an alignment hint. intel parallel studio xe 2017
He added __attribute__((aligned(64))) and #pragma vector aligned. Recompiled. The evictions stopped. Performance jumped another 4%.
That 4% didn't matter to the defense contract. But it mattered to Aris. Because somewhere, in the deep stack of the 2017 toolchain, a human engineer at Intel had written a heuristic that said: "When you see this pattern, assume alignment." That heuristic was wrong for his specific case. But the tool let him see the error. He stayed until dawn
Parallel Studio XE 2017 was not a silver bullet. It was a mirror. It reflected the gap between what you thought your code was doing and what the silicon was actually doing. And that gap, Aris realized, was where all the great optimizations lived.
Writing fast code is one thing; finding why code is slow is another. The suite includes two legendary tools: Is updating your Makefile to use icc instead
Is updating your Makefile to use icc instead of gcc worth it? In 2017, the answer was a resounding "yes."