Prime95 was designed to take advantage of the then new Pentium based PC systems. The Pentium class of processor was a marked improvement over the 486 class. The following is a list of major performance enhancements.
- Superscalar architecture - The Pentium has two datapaths (pipelines) that allow it to complete two instructions per clock cycle in many cases. The main pipe (U) can handle any instruction, while the other (V) can handle the most common simple instructions. Some RISC proponents had argued that the "complicated" x86 instruction set would probably never be implemented by a tightly pipelined microarchitecture, much less by a dual pipeline design. The 486 and the Pentium demonstrated that this was indeed possible and feasible.
- 64-bit external databus doubles the amount of information possible to read or write on each memory access and therefore allows the Pentium to load its code cache faster than the 80486; it also allows faster access and storage of 64-bit and 80-bit x87 FPU data. Moving data into and out of the CPU is a major consideration for processing large numbers.
- Much faster FPU. Some instructions showed an enormous improvement, most notably FMUL, with up to 15 times higher throughput than in the 80486 FPU. The Pentium is also able to execute a FXCH ST(x) instruction in parallel with an ordinary (arithmetical or load/store) FPU instruction. Since Prime95 uses the FPU very heavily, these improvements were important.
- A faster fully hardware-based multiplier makes instructions such as MUL and IMUL several times as fast (and more predictable) than in the 80486; the execution time is reduced from 13~42 clock cycles down to 10~11 for 32-bit operands.