* PSCID memory read require one clock tick, so add a pipeline stage * To limit complexity, remove tready. Throttling will be done outside