将寄存器设置为零的方法有多少？

在IA32下如何将0移入ax中有很多可能性...

    lea eax, [0]
    mov eax, 0FFFF0000h         //All constants form 0..0FFFFh << 16
    shr eax, 16                 //All constants form 16..31
    shl eax, 16                 //All constants form 16..31

也许最奇怪的...... :)

@movzx:
    movzx eax, byte ptr[@movzx + 6]   //Because the last byte of this instruction is 0

和...

  @movzx:
    movzx ax, byte ptr[@movzx + 7]

编辑：对于16位x86 cpu模式，未经测试......：

    lea  ax, [0]

和...

  @movzx:
    movzx ax, byte ptr cs:[@movzx + 7]   //Check if 7 is right offset

如果ds段寄存器不等于cs段寄存器，则cs：前缀是可选的。

磨标烫徽啪

请参阅此答案，了解零寄存器的最佳方法：xor eax,eax（性能优势和较小的编码）。我将只考虑单个指令可以将寄存器归零的方式。如果允许从内存中加载零，有太多方法，所以我们主要排除从内存加载的指令。我找到了10个不同的单个指令，它们将32位寄存器归零（因此在长模式下为完整的64位寄存器），没有任何其他存储器的前置条件或负载。这不计算相同insn的不同编码，或不同形式的mov。如果你计算从已知存储零的存储器加载，或者从段寄存器或其他任何东西加载，那么就有很多方法。零矢量寄存器也有很多种方法。对于大多数这些版本，eax和rax版本是针对相同功能的单独编码，它们都将整个64位寄存器归零，或者隐式地将上半部分归零，或者使用REX.W前缀显式写入完整寄存器。整数寄存器：

# Works on any reg unless noted, usually of any size.  eax/ax/al as placeholders
and    eax, 0         ; three encodings: imm8, imm32, and eax-only imm32
andn   eax, eax,eax   ; BMI1 instruction set: dest = ~s1 & s2
imul   eax, any,0     ; eax = something * 0.  two encodings: imm8, imm32
lea    eax, [0]       ; absolute encoding (disp32 with no base or index).  Use [abs 0] in NASM if you used DEFAULT REL
lea    eax, [rel 0]   ; YASM supports this, but NASM doesn't: use a RIP-relative encoding to address a specific absolute address, making position-dependent code

mov    eax, 0         ; 5 bytes to encode (B8 imm32)
mov    rax, strict dword 0   ; 7 bytes: REX mov r/m64, sign-extended-imm32.    NASM optimizes mov rax,0 to the 5B version, but dword or strict dword stops it for some reason
mov    rax, strict qword 0   ; 10 bytes to encode (REX B8 imm64).  movabs mnemonic for AT&T.  normally assemblers choose smaller encodings if the operand fits, but strict qword forces the imm64.

sub    eax, eax         ; recognized as a zeroing idiom on some but maybe not all CPUs
xor    eax, eax         ; Preferred idiom: recognized on all CPUs

@movzx:
  movzx eax, byte ptr[@movzx + 6]   //Because the last byte of this instruction is 0.  neat hack from GJ.'s answer

.l: loop .l             ; clears e/rcx... eventually.  from I. J. Kennedy's answer.  To operate on only ECX, use an address-size prefix.
; rep lodsb             ; not counted because it's not safe (potential segfaults), but also zeros ecx

“将所有位移出一端”对于常规大小的GP寄存器是不可能的，只有部分寄存器是不可能的。 shl和shr移位计数被屏蔽：count &= 31;，相当于count %= 32;。（但是286和更早版本只有16位，所以ax是一个“完整”寄存器。指令的shr r/m16, imm8可变计数形式加了286，所以有一些CPU可以将一个整数寄存器归零。）另请注意，向量的移位计数饱和而不是换行。

# Zeroing methods that only work on 16bit or 8bit regs:
shl    ax, 16           ; shift count is still masked to 0x1F for any operand size less than 64b.  i.e. count %= 32
shr    al, 16           ; so 8b and 16b shifts can zero registers.

# zeroing ah/bh/ch/dh:  Low byte of the reg = whatever garbage was in the high16 reg
movxz  eax, ah          ; From Jerry Coffin's answer

取决于其他现有条件（除了在另一个注册表中为零）：

bextr  eax,  any, eax  ; if al >= 32, or ah = 0.  BMI1
BLSR   eax,  src       ; if src only has one set bit
CDQ                    ; edx = sign-extend(eax)
sbb    eax, eax        ; if CF=0.  (Only recognized on AMD CPUs as dependent only on flags (not eax))
setcc  al              ; with a condition that will produce a zero based on known state of flags

PSHUFB   xmm0, all-ones  ; xmm0 bytes are cleared when the mask bytes have their high bit set

矢量注册：其中一些SSE2整数指令也可用于MMX寄存器（mm0 - mm7）。同样，最好的选择是某种形式的xor。要么PXOR/VPXOR，要么XORPS/VXORPS。 AVXvxorps xmm0,xmm0,xmm0全部为ymm0 / zmm0，并且在AMD CPU上优于vxorps ymm0,ymm0,ymm0。这些归零指令有三种编码：传统SSE，AVX（VEX前缀）和AVX512（EVEX前缀），尽管SSE版本仅将底部128归零，这不是支持AVX或AVX512的CPU上的完整寄存器。无论如何，根据你的计算方式，每个条目可以是三个不同的指令（相同的操作码，只是不同的前缀）。除了vzeroall，AVX512没有改变（并且没有zmm16-31为零）。

ANDNPD    xmm0, xmm0
ANDNPS    xmm0, xmm0
PANDN     xmm0, xmm0     ; dest = ~dest & src

PCMPGTB   xmm0, xmm0     ; n > n is always false.
PCMPGTW   xmm0, xmm0     ; similarly, pcmpeqd is a good way to do _mm_set1_epi32(-1)
PCMPGTD   xmm0, xmm0
PCMPGTQ   xmm0, xmm0     ; SSE4.2, and slower than byte/word/dword


PSADBW    xmm0, xmm0     ; sum of absolute differences
MPSADBW   xmm0, xmm0, 0  ; SSE4.1.  sum of absolute differences, register against itself with no offset.  (imm8=0: same as PSADBW)

  ; shift-counts saturate and zero the reg, unlike for GP-register shifts
PSLLDQ    xmm0, 16       ;  left-shift the bytes in xmm0
PSRLDQ    xmm0, 16       ; right-shift the bytes in xmm0
PSLLW     xmm0, 16       ; left-shift the bits in each word
PSLLD     xmm0, 32       ;           double-word
PSLLQ     xmm0, 64       ;             quad-word
PSRLW/PSRLD/PSRLQ  ; same but right shift

PSUBB/W/D/Q   xmm0, xmm0     ; subtract packed elements, byte/word/dword/qword
PSUBSB/W   xmm0, xmm0     ; sub with signed saturation
PSUBUSB/W  xmm0, xmm0     ; sub with unsigned saturation

PXOR       xmm0, xmm0
XORPD      xmm0, xmm0
XORPS      xmm0, xmm0

VZEROALL

# Can raise an exception on SNaN, so only usable if you know exceptions are masked
CMPLTPD    xmm0, xmm0         # exception on QNaN or SNaN, or denormal
VCMPLT_OQPD xmm0, xmm0,xmm0   # exception only on SNaN or denormal
CMPLT_OQPS ditto

VCMPFALSE_OQPD xmm0, xmm0, xmm0   # This is really just another imm8 predicate value fro the same VCMPPD xmm,xmm,xmm, imm8 instruction.  Same exception behaviour as LT_OQ.

SUBPS xmm0, xmm0和类似物不起作用，因为NaN-NaN = NaN，而不是零。此外，FP指令可以引发NaN参数的异常，因此即使您知道掩码异常，CMPPS / PD也是安全的，并且您不关心可能在MXCSR中设置异常位。即使是AVX版本，随着谓词的扩展选择，也会在SNaN上提高#IA。 “安静”的谓词只能抑制QNaN的#IA。 CMPPS / PD也可以引发Denormal异常。（请参阅insn set refact entry for CMPPD中的表格，或者最好是英特尔原始PDF格式，因为HTML摘录会破坏该表格。） AVX512：这里可能有几种选择，但我现在还不够好奇，要深入挖掘指令集列表，寻找所有这些选项。但有一个值得一提的有趣的是：VPTERNLOGD / Q可以将寄存器设置为all-1，而imm8 = 0xFF。（但在当前实现上对旧值具有错误的依赖性）。由于比较指令都比较为掩码，在我的测试中，VPTERNLOGD似乎是在Skylake-AVX512上将矢量设置为全1的最佳方法，尽管不特殊情况下imm8 = 0xFF情况以避免错误依赖。

VPTERNLOGD zmm0, zmm0,zmm0, 0     ; inputs can be any registers you like.

x87 FP：只有一个选择（因为如果旧值是无穷大或NaN，则sub不起作用）。

FLDZ    ; push +0.0

梆晨灸碾

还有几种可能性：

sub ax, ax

movxz, eax, ah

编辑：我应该注意到movzx并不是全部为eax - 它只是零ah（加上前16位不能作为寄存器访问）。至于最快，如果记忆服务sub和xor是等价的。它们比（大多数）其他人更快，因为它们很常见，CPU设计者为它们添加了特殊优化。具体而言，对于正常的sub或xor，结果取决于寄存器中的先前值。 CPU特别识别xor-with-self和subtract-from-self，因此它知道依赖链在那里被破坏。之后的任何指令都不依赖于任何先前的值，因此它可以使用重命名寄存器并行执行先前和后续指令。特别是在较旧的处理器上，我们预计'mov reg，0'会因为它有额外的16位数据而变慢，大多数早期处理器（特别是8088）主要受限于它们从内存加载流的能力 - - 实际上，在8088上，您可以使用任何参考表非常准确地估计运行时间，并且只需注意所涉及的字节数。这确实打破了div和idiv指令，但就是这样。 OTOH，我应该闭嘴，因为8088确实对任何人都没什么兴趣（现在至少十年）。

坍锭嘉韭蓝

您可以使用LOOP $将寄存器CX设置为0。

郡晒景沧

当然，特定情况下还有其他方法可以将寄存器设置为0：例如，如果你有eax设置为正整数，你可以用cdq/cltd将edx设置为0（这个技巧用于着名的24字节shellcode，出现在“不安全的编程实例”中）。

骚瓤

这个帖子虽然陈旧，但还有其他几个例子。简单的：

xor eax,eax

sub eax,eax

and eax,0

lea eax,[0] ; it doesn't look "natural" in the binary

更复杂的组合：

; flip all those 1111... bits to 0000
or  eax,-1  ;  eax = 0FFFFFFFFh
not eax     ; ~eax = 0

; XOR EAX,-1 works the same as NOT EAX instruction in this case, flipping 1 bits to 0
or  eax,-1  ;  eax = 0FFFFFFFFh
xor eax,-1  ; ~eax = 0

; -1 + 1 = 0
or  eax,-1 ;  eax = 0FFFFFFFFh or signed int = -1
not eax    ;++eax = 0

导力疵谜

mov eax,0  
shl eax,32  
shr eax,32  
imul eax,0 
sub eax,eax 
xor eax,eax   
and eax,0  
andn eax,eax,eax 

loop $ ;ecx only  
pause  ;ecx only (pause="rep nop" or better="rep xchg eax,eax")

;twogether:  
push dword 0    
pop eax

or eax,0xFFFFFFFF  
not eax

xor al,al ;("mov al,0","sub al,al",...)  
movzx eax,al
...

将寄存器设置为零的方法有多少？

7 个回复

发起人

x86_16

tasm

问题状态

将寄存器设置为零的方法有多少？

与内容相关的链接

7 个回复

发起人

x86_16

tasm

问题状态