change(newlib): enable LIBC_OPTIMIZED_MISALIGNED_ACCESS by default

2025-10-03 18:40:59 +02:00 · 2025-08-17 11:52:54 +07:00
parent 913d38ba14
commit b266d829dd
5 changed files with 63 additions and 3 deletions
--- a/components/newlib/Kconfig
+++ b/components/newlib/Kconfig
@@ -145,7 +145,7 @@ menu "LibC"

    config LIBC_OPTIMIZED_MISALIGNED_ACCESS
        bool "Use performance-optimized memXXX/strXXX functions on misaligned memory access"
-        default n
+        default y
        depends on ESP_ROM_HAS_SUBOPTIMAL_NEWLIB_ON_MISALIGNED_MEMORY
        help
            Enables performance-optimized implementations of memory and string functions
--- a/docs/en/api-guides/performance/ram-usage.rst
+++ b/docs/en/api-guides/performance/ram-usage.rst
@@ -194,6 +194,7 @@ The following options will reduce IRAM usage of some ESP-IDF features:
    :SOC_GPSPI_SUPPORTED: - Enable :ref:`CONFIG_HEAP_PLACE_FUNCTION_INTO_FLASH`. Provided that :ref:`CONFIG_SPI_MASTER_ISR_IN_IRAM` is not enabled and the heap functions are not incorrectly used from ISRs, this option is safe to enable in all configurations.
    :esp32c2: - Enable :ref:`CONFIG_BT_RELEASE_IRAM`. Release BT text section and merge BT data, bss & text into a large free heap region when ``esp_bt_mem_release`` is called. This makes Bluetooth unavailable until the next restart, but saving ~22 KB or more of IRAM.
    - Disable :ref:`CONFIG_LIBC_LOCKS_PLACE_IN_IRAM` if no ISRs that run while cache is disabled (i.e. IRAM ISRs) use libc lock APIs.
+    :CONFIG_ESP_ROM_HAS_SUBOPTIMAL_NEWLIB_ON_MISALIGNED_MEMORY: - Disable :ref:`CONFIG_LIBC_OPTIMIZED_MISALIGNED_ACCESS` to save approximately 1000 bytes of IRAM, at the cost of reduced performance.

 .. only:: esp32

--- a/docs/en/api-guides/performance/speed.rst
+++ b/docs/en/api-guides/performance/speed.rst
@@ -87,7 +87,6 @@ The following optimizations improve the execution of nearly all code, including
    :SOC_CPU_HAS_FPU: - Avoid using floating point arithmetic ``float``. Even though {IDF_TARGET_NAME} has a single precision hardware floating point unit, floating point calculations are always slower than integer calculations. If possible then use fixed point representations, a different method of integer representation, or convert part of the calculation to be integer only before switching to floating point.
    :not SOC_CPU_HAS_FPU: - Avoid using floating point arithmetic ``float``. On {IDF_TARGET_NAME} these calculations are emulated in software and are very slow. If possible, use fixed point representations, a different method of integer representation, or convert part of the calculation to be integer only before switching to floating point.
    - Avoid using double precision floating point arithmetic ``double``. These calculations are emulated in software and are very slow. If possible then use an integer-based representation, or single-precision floating point.
-    :CONFIG_ESP_ROM_HAS_SUBOPTIMAL_NEWLIB_ON_MISALIGNED_MEMORY: - Avoid misaligned 4-byte memory accesses in performance-critical code sections. For potential performance improvements, consider enabling :ref:`CONFIG_LIBC_OPTIMIZED_MISALIGNED_ACCESS`, which requires approximately 190 bytes of IRAM and 870 bytes of flash memory. Note that properly aligned memory operations will always execute at full speed without performance penalties.


 .. only:: esp32s2 or esp32s3 or esp32p4
--- a/docs/en/migration-guides/release-6.x/6.0/toolchain.rst
+++ b/docs/en/migration-guides/release-6.x/6.0/toolchain.rst
@@ -107,3 +107,64 @@ The header ``<sys/signal.h>`` is no longer available in Picolibc. To ensure comp

    #include <sys/signal.h> /* fatal error: sys/signal.h: No such file or directory */
    #include <signal.h>     /* Ok: standard and portable */
+
+.. only:: CONFIG_ESP_ROM_HAS_SUBOPTIMAL_NEWLIB_ON_MISALIGNED_MEMORY
+
+    RISC-V Chips and Misaligned Memory Access in LibC Functions
+    -----------------------------------------------------------
+
+    Espressif RISC-V chips can perform misaligned memory accesses with only a small
+    performance penalty compared to aligned accesses.
+
+    Previously, LibC functions that operate on memory (such as copy or comparison
+    functions) were implemented using byte-by-byte operations when a non-word-aligned
+    pointer was passed. Now, these functions use word (4-byte) load/store operations
+    whenever possible, resulting in a significant performance increase. These optimized
+    implementations are enabled by default via :ref:`CONFIG_LIBC_OPTIMIZED_MISALIGNED_ACCESS`,
+    which reduces the application’s memory budget (IRAM) by approximately 800–1000 bytes.
+
+    The table below shows benchmark results on the ESP32-C3 chip using 4096-byte buffers:
+
+    .. list-table:: Benchmark Results
+       :header-rows: 1
+       :widths: 20 20 20 20
+
+       * - Function
+         - Old (CPU cycles)
+         - Optimized (CPU cycles)
+         - Improvement (%)
+       * - memcpy
+         - 32873
+         - 4200
+         - 87.2
+       * - memcmp
+         - 57436
+         - 14722
+         - 74.4
+       * - memmove
+         - 49336
+         - 9237
+         - 81.3
+       * - strcpy
+         - 28678
+         - 16659
+         - 41.9
+       * - strcmp
+         - 36867
+         - 11146
+         - 69.8
+
+    .. note::
+       The results above apply to misaligned memory operations.
+       Performance for aligned memory operations remains unchanged.
+
+    Functions with Improved Performance
+    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+    - ``memcpy``
+    - ``memcmp``
+    - ``memmove``
+    - ``strcpy``
+    - ``strncpy``
+    - ``strcmp``
+    - ``strncmp``
--- a/docs/zh_CN/api-guides/performance/speed.rst
+++ b/docs/zh_CN/api-guides/performance/speed.rst
@@ -87,7 +87,6 @@
    :SOC_CPU_HAS_FPU: - 避免使用浮点运算 ``float``。尽管 {IDF_TARGET_NAME} 具备单精度浮点运算器，但是浮点运算总是慢于整数运算。因此可以考虑使用不同的整数表示方法进行运算，如定点表示法，或者将部分计算用整数运算后再切换为浮点运算。
    :not SOC_CPU_HAS_FPU: - 避免使用浮点运算 ``float``。{IDF_TARGET_NAME} 通过软件模拟进行浮点运算，因此速度非常慢。可以考虑使用不同的整数表示方法进行运算，如定点表示法，或者将部分计算用整数运算后再切换为浮点运算。
    - 避免使用双精度浮点运算 ``double``。{IDF_TARGET_NAME} 通过软件模拟进行双精度浮点运算，因此速度非常慢。可以考虑使用基于整数的表示方法或单精度浮点数。
-    :CONFIG_ESP_ROM_HAS_SUBOPTIMAL_NEWLIB_ON_MISALIGNED_MEMORY: - 在性能要求较高的代码段中，应避免执行未对齐的 4 字节内存访问。为提升性能，可以考虑启用 :ref:`CONFIG_LIBC_OPTIMIZED_MISALIGNED_ACCESS`。启用此选项将额外占用约 190 字节的 IRAM 和 870 字节的 flash 存储。请注意，正确对齐的内存操作始终能够以全速执行，且不会产生性能损耗。


 .. only:: esp32s2 or esp32s3 or esp32p4