Wednesday, July 27, 2005

Paper Summary: Architecture-Level Power Optimization What are the Limits

A paper on power optimization which points out to the various sources of architectural power waste and possibilities of optimization!

Dean M. Tullsen and John S. Seng

Following is the summary of the paper:

Idealized study for aggressive wide issue superscaler processor (not
comparison with statically scheduled processor architecture).

The categories of power waste:
1. Program waste: instructions that are fetched and executed that are
not necessary for correct execution.instruction that produce dead or
redundant values( silent stores
2. architectural waste (static sizing of cache and memory structures
3. Speculation waste : speculative fetching and execution but finally
are not committed

Simulations done on STMSIM simulator in single thread mode.
The processor model is 8 fetch 8 stage out-of-order. 6 integer FU, 3
FP and 4 load/store.

Program waste: run the trace and identify the redundant instruction
and mark them. Then rerun the trace without charging the processor with
power cost of these instructions(how is this quantified??) .( but do
these instruction affect the issue width or scheduling the actual
instructions). The power required for the additional resources like
the reuse buffer is not accounted for.
When just the dead instruction and not the instruction which acts a
producer to the these ,are considered.
Little energy is wasted in the benchmark ( both FP and Int benchmarks)
runs on conditional MOVE instruction.
When the producer instruction are considered as well the power saving
increases significantly. For Int Specs the producer leading to silent
Integer operation and silent load contribite maximum to the waste.

for FP - silent FP and silent loads are max power waste.
producer of Silent store inst. also contribute heavily to the waste.
=> in FP there are long sequence of instructions executed before the
value is stored and if that store is silent ,leads to greater loss.

preictable instruction redundancy(Value Prediction) : this includes the
class of inst. which operate on the same input values and potentially
produce the same output (mostly??). they are not exactly redundant coz
they may still change the architectural state (the execution of these
instr. separated by the instruction which update the same destination
and therefore not redundant by previous definition) therefore these
instruction set may overlap with the register silent.

Speculation waste: the integer benchmarks see a more waste on
speculative execution for more of difficult to predict branches.
Eliminating speculation is performance degrading so not much can be done
here but controling the level of exection proivdes some oppurtunity(
Pipeline Gating and SMT)

Architectural Waste:
Suboptimal structure sizing than required for the performance required
for the particular application .The structures studied are Data Cache
and Insturction Queues.

Most of the benchmarks put less pressure on the IC and the large size
of IC is mostly not warranted for.
Instruction queues: the instruction executed are mostly from the top
of the queue( or atleast the small portion of the Inst queue) => large
queues size is mostly not required).

Removing Waste:
the total energy waste for Integer and FP benchmarks are not very
different but the source contributing more differs.

Conclusion:
Paper points out to the sources and possibility of power saving .
The question that are important:
1. how to measure the energy cost of instruction.
2.energy impact of extra resources for redundant and dead code eliminations
3.value prediction is looks increasing complex with increase data
widths (but saving could be potentially also more with these large
width of computational structures

No comments: