Conclusion#
The conclusion is written at the beginning
Defining local loop variables within the for loop statement, whether using the AC6 compiler or the GCC compiler, will not result in multiple stack operations, but will use the same two stack offsets. If optimization is enabled, when there is no actual difference in logical functionality between the two, the assembly will be exactly the same.
In fact, defining the loop variable at the same time as the for loop is an excellent practice. Moving all local variable definitions to the beginning of the function can lead to effective negative optimization or no optimization (depending on the optimization level and compiler).
Therefore, if extreme performance is pursued, the local variable should only be declared in the branch where it is used.
The following tests are compiled targeting stm32H7
Discussing stack operations of local variables with the following code#
for(int i = 0; i < 50; i++)
{
for(int j = 0; j < 50; j++)
{
HAL_Delay(1);
}
}
Intuitively, each time the first loop runs, a local variable j is declared. Will this lead to multiple stack allocation operations?
The disassembly for this part is as follows#
0x0000001e: LDR r0,[sp,#0]
0x00000020: STR r0,[sp,#8]
0x00000022: B {pc}+0x2 ; 0x24
0x00000024: LDR r0,[sp,#8]
0x00000026: CMP r0,#0x31
0x00000028: BGT {pc}+0x2c ; 0x54
0x0000002a: B {pc}+0x2 ; 0x2c
0x0000002c: MOVS r0,#0
0x0000002e: STR r0,[sp,#4]
0x00000030: B {pc}+0x2 ; 0x32
0x00000032: LDR r0,[sp,#4]
0x00000034: CMP r0,#0x31
0x00000036: BGT {pc}+0x14 ; 0x4a
0x00000038: B {pc}+0x2 ; 0x3a
0x0000003a: MOVS r0,#1
0x0000003c: BL HAL_Delay
0x00000040: B {pc}+0x2 ; 0x42
0x00000042: LDR r0,[sp,#4]
0x00000044: ADDS r0,#1
0x00000046: STR r0,[sp,#4]
0x00000048: B {pc}-0x16 ; 0x32
0x0000004a: B {pc}+0x2 ; 0x4c
0x0000004c: LDR r0,[sp,#8]
0x0000004e: ADDS r0,#1
0x00000050: STR r0,[sp,#8]
0x00000052: B {pc}-0x2e ; 0x24
Outer Loop#
This is not our main discussion point; it will simply use jumps to execute the inner loop 50 times.
0x0000001e: LDR r0,[sp,#0]
0x00000020: STR r0,[sp,#8]
0x00000022: B {pc}+0x2 ; 0x24
0x00000024: LDR r0,[sp,#8]
0x00000026: CMP r0,#0x31
0x00000028: BGT {pc}+0x2c ; 0x54
0x0000002a: B {pc}+0x2 ; 0x2c
; .....inner loop
0x0000004a: B {pc}+0x2 ; 0x4c
0x0000004c: LDR r0,[sp,#8]
0x0000004e: ADDS r0,#1
0x00000050: STR r0,[sp,#8]
0x00000052: B {pc}-0x2e ; 0x24
Inner Loop#
0x0000002c: MOVS r0,#0
0x0000002e: STR r0,[sp,#4]
0x00000030: B {pc}+0x2 ; 0x32
0x00000032: LDR r0,[sp,#4]
0x00000034: CMP r0,#0x31
0x00000036: BGT {pc}+0x14 ; 0x4a
0x00000038: B {pc}+0x2 ; 0x3a
0x0000003a: MOVS r0,#1
0x0000003c: BL HAL_Delay
0x00000040: B {pc}+0x2 ; 0x42
0x00000042: LDR r0,[sp,#4]
0x00000044: ADDS r0,#1
0x00000046: STR r0,[sp,#4]
0x00000048: B {pc}-0x16 ; 0x32
The instructions at 2c and 2e set the value at sp+4 to zero.
Then, using increment and jumps, it executes the loop 50 times.
This means that each time the outer loop runs, this set of stack operation logic for sp+4 will occur, while the outer loop will always operate on the logic for sp+8.
What if local variables are defined in advance?#
Change to the following code
int i = 0;
int j = 0;
for(i = 0; i < 50; i++)
{
for(j = 0; j < 50; j++)
{
HAL_Delay(1);
}
}
The disassembly for this part is as follows#
0x0000001e: LDR r0,[sp,#0]
0x00000020: STR r0,[sp,#8]
0x00000022: STR r0,[sp,#4]
0x00000024: STR r0,[sp,#8]
0x00000026: B {pc}+0x2 ; 0x28
0x00000028: LDR r0,[sp,#8]
0x0000002a: CMP r0,#0x31
0x0000002c: BGT {pc}+0x2c ; 0x58
0x0000002e: B {pc}+0x2 ; 0x30
0x00000030: MOVS r0,#0
0x00000032: STR r0,[sp,#4]
0x00000034: B {pc}+0x2 ; 0x36
0x00000036: LDR r0,[sp,#4]
0x00000038: CMP r0,#0x31
0x0000003a: BGT {pc}+0x14 ; 0x4e
0x0000003c: B {pc}+0x2 ; 0x3e
0x0000003e: MOVS r0,#1
0x00000040: BL HAL_Delay
0x00000044: B {pc}+0x2 ; 0x46
0x00000046: LDR r0,[sp,#4]
0x00000048: ADDS r0,#1
0x0000004a: STR r0,[sp,#4]
0x0000004c: B {pc}-0x16 ; 0x36
0x0000004e: B {pc}+0x2 ; 0x50
0x00000050: LDR r0,[sp,#8]
0x00000052: ADDS r0,#1
0x00000054: STR r0,[sp,#8]
0x00000056: B {pc}-0x2e ; 0x28
It can be seen that the loop part (26-56) is not different from the previous code (22-52), but instead, it has added two statements to set (sp+4) and (sp+8) to zero, resulting in negative optimization.
Will complicating the loop make a difference?#
The following code, along with its disassembly, still does not produce excessive stack operations for (sp+8) and (sp+12).
int test = 0;
for(int i = 0; i < 50; i++)
{
for(int j = 0; j < 50; j++)
{
if((test & 0x01) == 0)
HAL_Delay(1);
else
HAL_Delay(2);
}
test++;
}
0x0000001e: 9801 .. LDR r0,[sp,#4]
0x00000020: 9004 .. STR r0,[sp,#0x10]
0x00000022: 9003 .. STR r0,[sp,#0xc]
0x00000024: e7ff .. B {pc}+0x2 ; 0x26
0x00000026: 9803 .. LDR r0,[sp,#0xc]
0x00000028: 2831 1( CMP r0,#0x31
0x0000002a: dc21 !. BGT {pc}+0x46 ; 0x70
0x0000002c: e7ff .. B {pc}+0x2 ; 0x2e
0x0000002e: 2000 . MOVS r0,#0
0x00000030: 9002 .. STR r0,[sp,#8]
0x00000032: e7ff .. B {pc}+0x2 ; 0x34
0x00000034: 9802 .. LDR r0,[sp,#8]
0x00000036: 2831 1( CMP r0,#0x31
0x00000038: dc12 .. BGT {pc}+0x28 ; 0x60
0x0000003a: e7ff .. B {pc}+0x2 ; 0x3c
0x0000003c: f89d0010 .... LDRB r0,[sp,#0x10]
0x00000040: 07c0 .. LSLS r0,r0,#31
0x00000042: b920 . CBNZ r0,{pc}+0xc ; 0x4e
0x00000044: e7ff .. B {pc}+0x2 ; 0x46
0x00000046: 2001 . MOVS r0,#1
0x00000048: f7fffffe .... BL HAL_Delay
0x0000004c: e003 .. B {pc}+0xa ; 0x5a
0x0000004e: 2002 . MOVS r0,#2
0x00000050: f7fffffe .... BL HAL_Delay
0x00000054: e7ff .. B {pc}+0x2 ; 0x5a
0x00000056: e7ff .. B {pc}+0x2 ; 0x5c
0x00000058: 9802 .. LDR r0,[sp,#8]
0x0000005a: 3001 .0 ADDS r0,#1
0x0000005c: 9002 .. STR r0,[sp,#8]
0x0000005e: e7e9 .. B {pc}-0x2a ; 0x34
0x00000060: 9804 .. LDR r0,[sp,#0x10]
0x00000062: 3001 .0 ADDS r0,#1
0x00000064: 9004 .. STR r0,[sp,#0x10]
0x00000066: e7ff .. B {pc}+0x2 ; 0x68
0x00000068: 9803 .. LDR r0,[sp,#0xc]
0x0000006a: 3001 .0 ADDS r0,#1
0x0000006c: 9003 .. STR r0,[sp,#0xc]
0x0000006e: e7da .. B {pc}-0x48 ; 0x26
The following code, with declarations moved up, still results in negative optimization.
int test = 0;
int i = 0;
int j = 0;
for(i = 0; i < 50; i++)
{
for(j = 0; j < 50; j++)
{
if((test & 0x01) == 0)
HAL_Delay(1);
else
HAL_Delay(2);
}
test++;
}
0x0000001e: 9801 .. LDR r0,[sp,#4]
0x00000020: 9004 .. STR r0,[sp,#0x10]
0x00000022: 9003 .. STR r0,[sp,#0xc]
0x00000024: 9002 .. STR r0,[sp,#8]
0x00000026: 9003 .. STR r0,[sp,#0xc]
0x00000028: e7ff .. B {pc}+0x2 ; 0x2a
0x0000002a: 9803 .. LDR r0,[sp,#0xc]
0x0000002c: 2831 1( CMP r0,#0x31
0x0000002e: dc21 !. BGT {pc}+0x46 ; 0x74
0x00000030: e7ff .. B {pc}+0x2 ; 0x32
0x00000032: 2000 . MOVS r0,#0
0x00000034: 9002 .. STR r0,[sp,#8]
0x00000036: e7ff .. B {pc}+0x2 ; 0x38
0x00000038: 9802 .. LDR r0,[sp,#8]
0x0000003a: 2831 1( CMP r0,#0x31
0x0000003c: dc12 .. BGT {pc}+0x28 ; 0x64
0x0000003e: e7ff .. B {pc}+0x2 ; 0x40
0x00000040: f89d0010 .... LDRB r0,[sp,#0x10]
0x00000044: 07c0 .. LSLS r0,r0,#31
0x00000042: b920 . CBNZ r0,{pc}+0xc ; 0x52
0x00000044: e7ff .. B {pc}+0x2 ; 0x4a
0x00000046: 2001 . MOVS r0,#1
0x00000048: f7fffffe .... BL HAL_Delay
0x0000004c: e003 .. B {pc}+0xa ; 0x5a
0x0000004e: 2002 . MOVS r0,#2
0x00000050: f7fffffe .... BL HAL_Delay
0x00000054: e7ff .. B {pc}+0x2 ; 0x5a
0x00000056: e7ff .. B {pc}+0x2 ; 0x5c
0x00000058: 9802 .. LDR r0,[sp,#8]
0x0000005a: 3001 .0 ADDS r0,#1
0x0000005c: 9002 .. STR r0,[sp,#8]
0x0000005e: e7e9 .. B {pc}-0x2a ; 0x38
0x00000060: 9804 .. LDR r0,[sp,#0x10]
0x00000062: 3001 .0 ADDS r0,#1
0x00000064: 9004 .. STR r0,[sp,#0x10]
0x00000066: e7ff .. B {pc}+0x2 ; 0x6c
0x00000068: 9803 .. LDR r0,[sp,#0xc]
0x0000006a: 3001 .0 ADDS r0,#1
0x0000006c: 9003 .. STR r0,[sp,#0xc]
0x0000006e: e7da .. B {pc}-0x48 ; 0x2a
Using Optimization#
O1#
Still using the complex loop above
Declaring inside the for loop
0x00000014: 2400 .$ MOVS r4,#0
0x00000016: bf00 .. NOP
0x00000018: f0040501 .... AND r5,r4,#1
0x0000001c: 2632 2& MOVS r6,#0x32
0x0000001e: bf00 .. NOP
0x00000020: 2002 . MOVS r0,#2
0x00000022: 2d00 .- CMP r5,#0
0x00000024: bf08 .. IT EQ
0x00000026: 2001 . MOVEQ r0,#1
0x00000028: f7fffffe .... BL HAL_Delay
0x0000002c: 3e01 .> SUBS r6,#1
0x0000002e: d1f7 .. BNE {pc}-0xe ; 0x20
0x00000030: 3401 .4 ADDS r4,#1
0x00000032: 2c32 2, CMP r4,#0x32
0x00000034: d1f0 .. BNE {pc}-0x1c ; 0x18
Declaring in advance, both are completely identical
0x00000014: 2400 .$ MOVS r4,#0
0x00000016: bf00 .. NOP
0x00000018: f0040501 .... AND r5,r4,#1
0x0000001c: 2632 2& MOVS r6,#0x32
0x0000001e: bf00 .. NOP
0x00000020: 2002 . MOVS r0,#2
0x00000022: 2d00 .- CMP r5,#0
0x00000024: bf08 .. IT EQ
0x00000026: 2001 . MOVEQ r0,#1
0x00000028: f7fffffe .... BL HAL_Delay
0x0000002c: 3e01 .> SUBS r6,#1
0x0000002e: d1f7 .. BNE {pc}-0xe ; 0x20
0x00000030: 3401 .4 ADDS r4,#1
0x00000032: 2c32 2, CMP r4,#0x32
0x00000034: d1f0 .. BNE {pc}-0x1c ; 0x18
O2#
Still using the complex loop above
Declaring inside the for loop
0x00000014: 2500 .% MOVS r5,#0
0x00000016: bf00 .. NOP
0x00000018: 2402 .$ MOVS r4,#2
0x0000001a: 2632 2& MOVS r6,#0x32
0x0000001c: 07e8 .. LSLS r0,r5,#31
0x0000001e: bf08 .. IT EQ
0x00000020: 2401 .$ MOVEQ r4,#1
0x00000022: bf00 .. NOP
0x00000024: 4620 F MOV r0,r4
0x00000026: f7fffffe .... BL HAL_Delay
0x0000002a: 3e01 .> SUBS r6,#1
0x0000002c: d1fa .. BNE {pc}-0x8 ; 0x24
0x0000002e: 3501 .5 ADDS r5,#1
0x00000030: 2d32 2- CMP r5,#0x32
0x00000032: d1f1 .. BNE {pc}-0x1a ; 0x18
Declaring in advance, both are completely identical
0x00000014: 2500 .% MOVS r5,#0
0x00000016: bf00 .. NOP
0x00000018: 2402 .$ MOVS r4,#2
0x0000001a: 2632 2& MOVS r6,#0x32
0x0000001c: 07e8 .. LSLS r0,r5,#31
0x0000001e: bf08 .. IT EQ
0x00000020: 2401 .$ MOVEQ r4,#1
0x00000022: bf00 .. NOP
0x00000024: 4620 F MOV r0,r4
0x00000026: f7fffffe .... BL HAL_Delay
0x0000002a: 3e01 .> SUBS r6,#1
0x0000002c: d1fa .. BNE {pc}-0x8 ; 0x24
0x0000002e: 3501 .5 ADDS r5,#1
0x00000030: 2d32 2- CMP r5,#0x32
0x00000032: d1f1 .. BNE {pc}-0x1a ; 0x18
O3#
O3 has no discussion value, as it completely unrolls the loop.
Situation under GCC environment#
Defining local variables in advance also results in negative optimization.
Local variable defined inside the for loop, 20 instructions
Local variables defined in advance, 24 instructions
This article is updated synchronously to xLog by Mix Space
The original link is https://www.yono233.cn/posts/shoot/24_8_6_%E5%85%B3%E4%BA%8E%E5%B1%80%E9%83%A8%E5%8F%98%E9%87%8F%E7%9A%84%E6%A0%88%E8%A1%8C%E4%B8%BA%E2%80%94%E2%80%94%E7%94%B1%E5%BE%AA%E7%8E%AF%E8%AF%AD%E5%8F%A5%E5%86%85%E5%AE%9A%E4%B9%89%E5%BE%86%E7%8E%AF%E5%8F%98%E9%87%8F%E5%BC%95%E7%94%B3