Warm tip: This article is reproduced from serverfault.com, please click

assembly compiler-optimization cpu-registers x86-64

x86 64-为什么在程序集中将％rax寄存器用于带有8个参数的此过程？

(x86 64 - Why is the %rax register used in the assembly for this procedure with 8 args?)

发布于 2020-12-09 03:42:31

我有以下C函数：

void proc(long  a1, long  *a1p,
          int   a2, int   *a2p,
          short a3, short *a3p,
          char  a4, char  *a4p)
{
    *a1p += a1;
    *a2p += a2;
    *a3p += a3;
    *a4p += a4;
}

使用Godbolt，我已将其转换为x86_64程序集（为简单起见，我使用了该-Og标志来最小化优化）。它产生以下程序集：

proc:
        movq    16(%rsp), %rax
        addq    %rdi, (%rsi)
        addl    %edx, (%rcx)
        addw    %r8w, (%r9)
        movl    8(%rsp), %edx
        addb    %dl, (%rax)
        ret

我对汇编的第一行感到困惑movq 16(%rsp), %rax。我知道该%rax寄存器用于存储返回值。但是该proc过程没有返回值。因此，我很好奇为什么在这里 %r9使用该寄存器，而不是使用某些不用于返回值的寄存器。

我对这条指令相对于其他指令的位置也感到困惑。它首先出现，很早就%rax需要它的目标寄存器了（实际上，直到最后一步才需要此寄存器）。它也出现在之前addq %rdi, (%rsi)，它是过程（*a1p += a1;）中第一行代码的翻译。

我想念什么？

Questioner

Richie Thomas

Viewed

0

Original

English

Peter Cordes 2020-12-09 21:52:51

它只是使用临时寄存器加载堆栈arg。 RAX是暂存注册表的首选。该函数没有返回值，因此RAX并不特殊。

通常，提前计划负载是隐藏负载使用延迟的好主意，因此，无序的exec不必费劲地隐藏它。请记住，这是优化的代码，因此每个C语句的指令都不是单独的单个块。对于一些这个简单的，这是很好的（未优化将一切存储到堆栈中，然后重新装入。另请参见本）

R9将是一个较差的选择，因为R9已被函数入口占用（带有另一个arg），从而限制了指令调度。更重要的是，因为addb %dl, (%r9)不需要REX前缀addb %dl, (%rax)。因此，这将浪费代码大小。

已经在使用的缺点不适用于R10或R11（像RAX一样，它们只是调用对象，但不用于arg传递），但是代码大小的缺点仍然适用。

R9B甚至没有任何意义。堆栈arg是一个指针。char a4加载到EDX之后，唯一使用的字节寄存器是DL（）。

（双字加载避免编写部分寄存器，并且不需要movzx / movzbl，因为调用者通常会写整个qword或至少是dword，即使对于窄args也是如此）。

编译器也可以早些移动此负载，但选择不移动。但是add %dl, (%rax)RMW处于上(%rax)，因此在dl加载之前(%rax)已准备好数据之前，不需要数据。尽早准备好RAX地址比DL数据更有价值，因为该地址正用于其他加载而不是ALU->存储。

热门github

1

A multi-platform library for OpenGL, OpenGL ES, Vulkan, window and input (翻译：适用于 OpenGL、OpenGL ES、Vulkan、窗口和输入的多平台库)

2

Dev tool that writes scalable apps from scratch while the developer oversees the implementation (翻译：可扩展开发工具的 PoC，该工具从头开始编写整个应用程序，同时开发人员监督实施)

3

shadcn/ui, but for Svelte. ✨ (翻译：shadcn-svelte是shadcn/ui的非官方社区主导的Svelte端口。)

4

The Python Risk Identification Tool for generative AI (PyRIT) is an open access automation framework to empower security professionals and machine learning engineers to proactively find risks in their generative AI systems. (翻译：用于生成式 AI 的 Python 风险识别工具 (PyRIT) 是一个开放式访问自动化框架，使安全专业人员和机器学习工程师能够主动发现其生成式 AI 系统中的风险。)

5

Performance-portable, length-agnostic SIMD with runtime dispatch (翻译：Highway 是一个提供可移植 SIMD/向量内在函数的 C++ 库。)

6

ZK Credo (翻译：ZK信条)

7

OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement (翻译：OpenCodeInterpreter：将代码生成与执行和优化集成)

8

Joplin - the secure note taking and to-do app with synchronisation capabilities for Windows, macOS, Linux, Android and iOS. (翻译：Joplin - 一个开源的笔记和待办事项应用程序，具有Windows，macOS，Linux，Android和iOS的同步功能。)

9

Mamba is a new state space model architecture showing promising performance on information-dense data such as language modeling, where previous subquadratic models fall short of Transformers. It is based on the line of progress on structured state space models, with an efficient hardware-aware design and implementation in the spirit of FlashAttention. (翻译：Mamba 是一种新的状态空间模型架构，在信息密集型数据（例如语言建模）上显示出良好的性能，而之前的二次模型在 Transformers 方面存在不足。它基于结构化状态空间模型的进展，并本着FlashAttention的精神进行高效的硬件感知设计和实现。)

10

This repository contains System Design resources which are useful while preparing for interviews and learning Distributed Systems (翻译：该存储库包含系统设计资源，在准备面试和学习分布式系统时非常有用)

11

Curso para aprender el lenguaje de programación Python desde cero y para principiantes. 75 clases, 37 horas en vídeo, código, proyectos y grupo de chat. Fundamentos, frontend, backend, testing, IA... (翻译：从零开始学习 Python 编程语言的课程，适合初学者)

12

🎓 Path to a free self-taught education in Computer Science! (翻译：🎓计算机科学免费自学教程！)

13

1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java (翻译：十亿行挑战 —— 使用 Java 对文本文件中的 10 亿行数据进行聚合的有趣探索)

14

A collective list of free APIs (翻译：免费 API 的集合列表)

15

📚 Freely available programming books (翻译：📚 免费提供的编程书籍)