北京中科昊芯科技有限公司

 找回密码
 立即注册

QQ登录

只需一步,快速开始

查看: 1761|回复: 0

C2000 Code Generation Tools v22.6.0.LTS Release Notes

[复制链接]

14

主题

16

回帖

132

积分

注册会员

积分
132
发表于 2023-4-27 11:00:34 | 显示全部楼层 |阅读模式
本帖最后由 再玩剁手 于 2023-4-27 11:04 编辑

C2000 Code Generation Tools v22.6.0.LTS Release Notes
C2000 Code Generation Tools v22.6.0.LTS




New Features:

New option --fp_single_precision_constant treats unsuffixed floating point constants as 32-bit
New option --fp_single_precision_constant treats unsuffixed floating-point constants as single precision instead of implicitly converting them to double-precision constants.

Interrupt save/restore efficiency improvement
The C2000 compiler now only saves/restores any registers that were used in an interrupt service routine (ISR). Additionally, if an ISR makes function calls, then all save-on-call (SoC) registers will get saved/restored. For FPU compilations, the RB register is saved/restored for all low priority ISRs, however, for high priority ISRs, the RB register is only saved/restored if the ISR may use an RPTB instruction.
The compiler user guide states:
“If a C/C++ interrupt routine does not call any other functions, only those registers that the interrupt handler uses are saved and restored”
However, prior to this update, above has not been the case since at least release 6.4 and the actual behavior has been to save/restore all SoC registers as long as any single SoC register was used.

Performance improvements CLA
  • CLA support enabled for generating MMACF32||MMOV32
    The CLA compiler has a new feature for generating parallel MMACF32 with MMOV32 instructions. Generating MMACF32||MMOV32 requires --opt_level=2 or higher and loop unrolling is recommneded for creating more opportunities for the parallel MMACF32 with MMOV32. For example:

    1. // result, buff[], and coef[] are float
    2. interrupt void Cla1Task1 ( void )
    3. {
    4.      result = 0.0f;
    5.      int16_t i;
    6.      #pragma UNROLL(20)
    7.      for(i = 20; i > 0; i–)
    8.      {
    9.           buff<i> = buff[i-1];
    10.           result += coef<i> * buff<i>;
    11.      }
    12.      result += coef[0] * buff[0];
    13. }</i></i></i>
    复制代码

  • CLA MRn loads of unsigned short are more efficient
    The CLA compiler now generates more efficient loads of unsigned short variables to the 32-bit MRn register. Previously the compiler generated below including the redundant zero extending shifts:
    1. {
    2.      // unsigned a,b;
    3.      // a = b;
    4.      MMOVZ16 MR0,@b
    5.      MLSL32 MR0,#16
    6.      MLSR32 MR0,#16
    7. }
    复制代码

    The CLA compiler instead generates only below which already zero extends:
    1. {
    2.      // unsigned a,b;
    3.      // a = b;
    4.      MMOVZ16 MR0,@b
    5. }
    复制代码




Performance improvements C28
  • Improvements in reducing register spilling
    The C2000 compiler has a new feature for reducing register spilling. During register allocation (selection of registers), if no registers are available, then the compiler must spill a register to memory (or another register) in order to make it available for register allocation.

    A new optimization pass has been enabled for compilations with --opt_level=2 or higher and --float_support=fpu32/fpu64. If register spilling is detected, the compiler will generate a new instruction schedule to alleviate register pressure (shorten register use lifetimes) which should in turn reduce register spilling.
  • Keep global float/double variables in registers
    The C2000 compiler has a new optimization that will attempt to keep global float/double variables in registers if the device has fpu register support. As a simple example, prior to this optimization, below loop:
    f
    1. loat flt;
    2. for(i=0;i<2000;i++)
    3. {
    4.      flt = flt + flt;
    5. }
    复制代码

    would generate this assembly that loads/stores the float variable from memory during each iteration (with --opt_level=2):
        
    1. MOVL     XAR6,#1999
    2. loop:
    3.      MOVW     DP,#flt
    4.      MOV32    R0H,@flt
    5.      MOV32    R1H,@flt
    6.      ADDF32   R0H,R0H,R1H
    7.      NOP
    8.      MOV32    @flt,R0H
    9.      BANZ     loop,AR6
    复制代码

    — which now is improved to this sequence that keeps the float variable in R2H during the loop:
        
    1. MOVW     DP,#flt
    2.      MOVL     XAR6,#1999
    3.      MOV32    R2H,@flt
    4. loop:
    5.      ADDF32   R2H,R2H,R2H
    6.      BANZ     loop,AR6—
    复制代码

  • If-conversion improvements
    The current if-conversion optimizations for the C2000 compiler have been extended with below additional use cases.

    This sequence with --opt_level=off,0,1:
    1. extern float f1, f2;
    2. float foo(float cond)
    3. {
    4.      return cond > 1.0f ? f1 : f2;
    5. }

    6. ||foo||:
    7.      CMPF32    R0H,#16256
    8.      MOVST0    ZF, NF
    9.      B         ||label1||,LEQ
    10.      MOVW       DP,#||f1||
    11.      MOV32     R0H,@||f1||
    12.      B         ||label2||,UNC
    13. ||label1||:
    14.      MOVW       DP,#||f2||
    15.      MOV32     R0H,@||f2||
    16. ||label2||:
    17.      LRETR
    复制代码

    with --opt_level=2,3,4 is now generated without the branches:
    1. ||foo||:
    2.      CMPF32  R0H,#16256
    3.      MOVW    DP,#||f1||
    4.      MOV32   R0H,@||f1||,GT
    5.      MOVW    DP,#||f2||
    6.      MOV32   R0H,@||f2||,LEQ
    7.      LRETR
    复制代码

    This sequence:     
    1. MOVL   XAR4,#||t||
    2.      TBIT   *+XAR4[0],#1
    3.      MOVB   XAR0,#8,TC
    4.      B     ||label||,TC
    5.      MOVB   XAR0,#0
    6. label:
    复制代码

    is instead generated without the branch:     
    1. MOVL   XAR4,#||t||
    2.      MOVB   XAR0,#0
    3.      TBIT   *+XAR4[0],#1
    4.      MOVB   XAR0,#8,TC
    复制代码


    And this sequence:
        
    1. TBIT   *+XAR5[0],#2
    2.      B     label,NTC
    3.      MOV    AH,AR0
    4.      ORB    AH,#0x04
    5.      MOVZ   AR0,AH
    6. label:
    复制代码

    is instead generated without the branch:
    1.      TBIT   *+XAR5[0],#2
    2.      MOV    AH,AR0
    3.      ORB    AH,#0x04
    4.      MOV    AR0,AH,TC
    复制代码

  • Improved loads of 16-bit constants to ACC register
    Below accumulator load of 16-bit constants:
        
    1. MOVL XAR4,#256
    2. MOVL ACC,XAR4
    复制代码

    is now directly loaded to the ACC register:
        
    1. MOV ACC,#256
    复制代码

  • RTS library routine fmodf() now has faster tmu relaxed implementation using __fmodf intrinsic
    The current RTS library support for fmodf() has been extended with a faster tmu implementation using new intrinsic __fmodf().
    Below intrinsic is available with option --tmu_support=tmu0,tmu1.
    1. float __fmodf(float x, float y); /* return fmodf(x,y) */
    复制代码

    If the user specifies --fp_mode=relaxed in combination with the --tmu_support=tmu0,tmu1 option in the compiler command line, then the following RTS function will make use of the new __fmodf intrinsic:
    1. float fmodf(float x, float y);
    复制代码


Hex tool: new options --boot_align_sect and --boot_block_size=size
  • --boot_align_sect
    New option --boot_align_sect will cause the hex tool to adjust the default boot record limit size based on section alignment (for alignment > 1). For example, if an output section has ALIGN(16), then the boot record size will be adjusted from default value of 0xFFFE to 0xFFF0.
  • --boot_block_size=size
    New option --boot_block_size=size allows overriding the hex utility default boot block size. (ARM default is 0xFFFF which is not supported by C28 FLASH API).

    For use with the C28 FLASH API and the C28 on-chip bootloader:
         c28 hex boot files: hex2000 current-boot-options –boot_align_sect
         arm hex boot files: armhex current-boot-options –boot_align_sect –boot_block_size=0xFFFE


您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

手机版|北京中科昊芯科技有限公司 ( 京ICP备19023330号-3 )

GMT+8, 2024-11-24 05:41 , Processed in 0.151265 second(s), 19 queries .

Powered by Discuz! X3.5

© 2001-2023 Discuz! Team.

快速回复 返回顶部 返回列表