Target PC :: Intel Pentium 4 1700MHz (1.7GHz) Review

	\| about us \| advertise \| careers \| links \|

[an error occurred while processing this directive]

Advanced Dynamic Execution

Intel describes the Advanced Dynamic Execution being an out of order speculative execution engine. This Engine keeps the execution units executing instructions. This is accomplished by providing a large window of instructions from which the execution units can choose. The large out of order instruction window allows the processor to avoid stalls that might occur while instructions are waiting for dependencies to resolve. Intel’s previous P6 architecture featured a small window with 42 instructions, compared to the NetBurst architecture that can have up to 126 instructions in this window (in flight).

This technology at the same time features an improved branch prediction capability. The Pentium 4 is estimated to reduce branch miss-predictions by around 33% compared to the P6 architecture’s branch prediction capability. This is achieved by implanting a 4K branch target buffer that is used to store more detail on the history of past branches and as well as by implementing a more advanced branch prediction algorithm.

Rapid Execution Engine

The new architecture permitted the Pentium 4 to run the Arithmetic Logic Units (ALUs) two times the frequency of the Processor’s core it self. This means that the Arithmetic Logic Units on a Pentium 4 running at 1.5 are operating at 3GHz with a latency that is half the duration of the core clock. This can be directly translated in higher through and reduced latency of execution.

400MHz Front Side Bus

One of the most talked features of the Pentium 4 is its 400MHz BUS. The Pentium III Processor’s 133MHz bus, which is 64-bit Wide, is capable of delivering 1.06GB/S of data. The Architecture of the Pentium 4 is somewhat different. The Pentium 4’s bus is clocked at only 100MHz at also 64-bit Wide, what differs here is that the 100MHz is quad pumped and is capable of achieving a whooping 3.2GB/s peak.

Advanced Transfer Cache

Intel’s Pentium III features 8KB of L1 data cache. This is half the size of what the Pentium III features. This may seem a bit confusing at first, but smaller caches have lower latencies. This was done in order to decrease the latency of the L1 memory, this should result in an improved transfer rate but at the same time, the little size (8K) might not be enough for some specific tasks.

This is where the L2 memory comes in mind. The Pentium 4, like the Pentium III (Coppermine), spots 256k of on-die-cache on a 256-bit bus. However, there is a difference between both. The new architecture of the Pentium 4 permits to transfer data on each clock, compared to the Pentium III (Coppermine) that is transferring data on every other Cycle.

Execution Trace Cache

This technology caches decoded x86 instructions (micro-ops), thus removing the latency associated with the instruction decoder from the execution loop. The Execution Trace Cache stores the micro-ops in the path of program execution flow, where the results of branches in the code are integrated into the same cache line.

Execution Trace Cache is another handy technique Intel implemented in its new Architecture to ease the penalty of miss-Predicted Branch instructions. On older Intel processors, based on previous architectures, if the branch instruction was miss-predicted, the processor needed to start the process from the beginning. The NetBurst architectures permits to go directly through the Execution Trace Cache Technology to retrieve the micro-op and then send it through execution pipeline without having to restart the process from the first phase.

Streaming SIMD Extensions 2 (SSE2)

Intel’s Pentium 4 architecture features 144 new instructions capable of delivering 128-bit SIMD integer arithmetic operation and 128-Bit SIMD Double Precision Floating Point. In order to take benefit of these new features, current software-games will need to be re-compiled; otherwise, there will be no benefit. As with SSE, Game/Software developers should be incorporating SSE2 features in their code with no delays. These instructions can reduce the overall number of instructions required to execute a specific task and at the same time can result in a performance increase.

Web Target PC