This is designed to incentify community members as a proof of contribution token.

Using this You can,Buy courses,Reward others and exchange for real money.


Real Problems! Real Experts!

Join Our Telegram Channel !

The Eduladder is a community of students, teachers, and programmers. We help you to solve your academic and programming questions fast.
In eduladder you can Ask,Answer,Listen,Earn and Download Questions and Question papers.
Watch related videos of your favorite subject.
Connect with students from different parts of the world.
Apply or Post Jobs, Courses ,Internships and Volunteering opportunity. For FREE
See Our team
Wondering how we keep quality?
Got unsolved questions? Ask Questions

Software-Engineering-10IS51-unit-3-->View question

With a neat diagram explain the classic five stage pipe line for a RISC processor?

Instructionfetch (IF) cycle
Instruction decode
Execution / Effective address Cycle (EXE)
Memory access (MEM)
Write backcycle (WB)

Asked On2017-05-16 08:46:04 by:pallaviaithaln

Taged users:


Be first to dislike this question
Talk about this  Like  Dislike
View all questions
Instruction fetch

The Instruction Cache on these machines had a latency of one cycle, meaning that if the instruction was in the cache, it would be ready on the next clock cycle. During the Instruction Fetch stage, a 32-bit instruction was fetched from the cache.

The Program Counter, or PC, is a register responsible for holding the address of the current instruction. It feeds into the PC predictor which then sends the Program Counter (PC) to the Instruction Cache to read the current instruction. At the same time, the PC predictor predicts the address of the next instruction by incrementing the PC by 4 (all instructions were 4 bytes long). This prediction was always wrong in the case of a taken branch, jump, or exception (see delayed branches, below). Later machines would use more complicated and accurate algorithms (branch prediction and branch target prediction) to guess the next instruction address.


Unlike earlier microcoded machines, the first RISC machines had no microcode. Once fetched from the instruction cache, the instruction bits were shifted down the pipeline, so that simple combinational logic in each pipeline stage could produce the control signals for the datapath directly from the instruction bits. As a result, very little decoding is done in the stage traditionally called the decode stage. A consequence of this lack of decoding meant however that more instruction bits had to be used specifying what the instruction should do (and also, what it should not), and that leaves less bits for things like register indexes.

All MIPS, SPARC, and DLX instructions have at most two register inputs. During the decode stage, these two register names are identified within the instruction, and the two registers named are read from the register file. In the MIPS design, the register file had 32 entries.

At the same time the register file was read, instruction issue logic in this stage determined if the pipeline was ready to execute the instruction in this stage. If not, the issue logic would cause both the Instruction Fetch stage and the Decode stage to stall. On a stall cycle, the stages would prevent their initial flip-flops from accepting new bits.

If the instruction decoded was a branch or jump, the target address of the branch or jump was computed in parallel with reading the register file. The branch condition is computed after the register file is read, and if the branch is taken or if the instruction is a jump, the PC predictor in the first stage is assigned the branch target, rather than the incremented PC that has been computed. It should be noted that some architectures made use of the ALU in the Execute stage, at the cost of slightly decrease instruction throughput.

The decode stage ended up with quite a lot of hardware: the MIPS instruction set had the possibility of branching if two registers were equal, so a 32-bit-wide AND tree ran in series after the register file read, making a very long critical path through this stage. Also, the branch target computation generally required a 16 bit add and a 14 bit incrementer. Resolving the branch in the decode stage made it possible to have just a single-cycle branch mispredict penalty. Since branches were very often taken (and thus mispredicted), it was very important to keep this penalty low.


The Execute stage is where the actual computation occurs. Typically this stage consists of an Arithmetic and Logic Unit, and also a bit shifter. It may also include a multiple cycle multiplier and divider.

The Arithmetic and Logic Unit is responsible for performing boolean operations (and, or, not, nand, nor, xor, xnor) and also for performing integer addition and subtraction. Besides the result, the ALU typically provides status bits such as whether or not the result was 0, or if an overflow occurred.

The bit shifter is responsible for shift and rotations.

Instructions on these simple RISC machines can be divided into three latency classes according to the type of the operation:

  • Register-Register Operation (Single-cycle latency): Add, subtract, compare, and logical operations. During the execute stage, the two arguments were fed to a simple ALU, which generated the result by the end of the execute stage.
  • Memory Reference (Two-cycle latency). All loads from memory. During the execute stage, the ALU added the two arguments (a register and a constant offset) to produce a virtual address by the end of the cycle.
  • Multi-cycle Instructions (Many cycle latency). Integer multiply and divide and all floating-point operations. During the execute stage, the operands to these operations were fed to the multi-cycle multiply/divide unit. The rest of the pipeline was free to continue execution while the multiply/divide unit did its work. To avoid complicating the writeback stage and issue logic, multicycle instruction wrote their results to a separate set of registers.

Memory access

If data memory needs to be accessed, it is done so in this stage.

During this stage, single cycle latency instructions simply have their results forwarded to the next stage. This forwarding ensures that both single and two cycle instructions always write their results in the same stage of the pipeline, so that just one write port to the register file can be used, and it is always available.

For direct mapped and virtually tagged data caching, the simplest by far of the numerous data cache organizations, two SRAMs are used, one storing data and the other storing tags.


During this stage, both single cycle and two cycle instructions write their results into the register file.

Answerd on:2015-01-17 Answerd By:Anagha

Be first to like this answer

Be first to dislike this answer
Talk about this  Like  Dislike

You might like this video:Watch more here

Watch more videos from this user Here

Learn how to upload a video over here

Lets together make the web is a better place

We made eduladder by keeping the ideology of building a supermarket of all the educational material available under one roof. We are doing it with the help of individual contributors like you, interns and employees. So the resources you are looking for can be easily available and accessible also with the freedom of remix reuse and reshare our content under the terms of creative commons license with attribution required close.

You can also contribute to our vision of "Helping student to pass any exams" with these.
Answer a question: You can answer the questions not yet answered in eduladder.How to answer a question
Career: Work or do your internship with us.Work with us
Create a video: You can teach anything and everything each video should be less than five minutes should cover the idea less than five min.How to upload a video on eduladder