diff --git a/components/newlib/Kconfig b/components/newlib/Kconfig index 92d9afc412..2e849600bb 100644 --- a/components/newlib/Kconfig +++ b/components/newlib/Kconfig @@ -145,7 +145,7 @@ menu "LibC" config LIBC_OPTIMIZED_MISALIGNED_ACCESS bool "Use performance-optimized memXXX/strXXX functions on misaligned memory access" - default n + default y depends on ESP_ROM_HAS_SUBOPTIMAL_NEWLIB_ON_MISALIGNED_MEMORY help Enables performance-optimized implementations of memory and string functions diff --git a/docs/en/api-guides/performance/ram-usage.rst b/docs/en/api-guides/performance/ram-usage.rst index 413296eb30..a72da893fa 100644 --- a/docs/en/api-guides/performance/ram-usage.rst +++ b/docs/en/api-guides/performance/ram-usage.rst @@ -194,6 +194,7 @@ The following options will reduce IRAM usage of some ESP-IDF features: :SOC_GPSPI_SUPPORTED: - Enable :ref:`CONFIG_HEAP_PLACE_FUNCTION_INTO_FLASH`. Provided that :ref:`CONFIG_SPI_MASTER_ISR_IN_IRAM` is not enabled and the heap functions are not incorrectly used from ISRs, this option is safe to enable in all configurations. :esp32c2: - Enable :ref:`CONFIG_BT_RELEASE_IRAM`. Release BT text section and merge BT data, bss & text into a large free heap region when ``esp_bt_mem_release`` is called. This makes Bluetooth unavailable until the next restart, but saving ~22 KB or more of IRAM. - Disable :ref:`CONFIG_LIBC_LOCKS_PLACE_IN_IRAM` if no ISRs that run while cache is disabled (i.e. IRAM ISRs) use libc lock APIs. + :CONFIG_ESP_ROM_HAS_SUBOPTIMAL_NEWLIB_ON_MISALIGNED_MEMORY: - Disable :ref:`CONFIG_LIBC_OPTIMIZED_MISALIGNED_ACCESS` to save approximately 1000 bytes of IRAM, at the cost of reduced performance. .. only:: esp32 diff --git a/docs/en/api-guides/performance/speed.rst b/docs/en/api-guides/performance/speed.rst index 7eede72b63..58e191a200 100644 --- a/docs/en/api-guides/performance/speed.rst +++ b/docs/en/api-guides/performance/speed.rst @@ -87,7 +87,6 @@ The following optimizations improve the execution of nearly all code, including :SOC_CPU_HAS_FPU: - Avoid using floating point arithmetic ``float``. Even though {IDF_TARGET_NAME} has a single precision hardware floating point unit, floating point calculations are always slower than integer calculations. If possible then use fixed point representations, a different method of integer representation, or convert part of the calculation to be integer only before switching to floating point. :not SOC_CPU_HAS_FPU: - Avoid using floating point arithmetic ``float``. On {IDF_TARGET_NAME} these calculations are emulated in software and are very slow. If possible, use fixed point representations, a different method of integer representation, or convert part of the calculation to be integer only before switching to floating point. - Avoid using double precision floating point arithmetic ``double``. These calculations are emulated in software and are very slow. If possible then use an integer-based representation, or single-precision floating point. - :CONFIG_ESP_ROM_HAS_SUBOPTIMAL_NEWLIB_ON_MISALIGNED_MEMORY: - Avoid misaligned 4-byte memory accesses in performance-critical code sections. For potential performance improvements, consider enabling :ref:`CONFIG_LIBC_OPTIMIZED_MISALIGNED_ACCESS`, which requires approximately 190 bytes of IRAM and 870 bytes of flash memory. Note that properly aligned memory operations will always execute at full speed without performance penalties. .. only:: esp32s2 or esp32s3 or esp32p4 diff --git a/docs/en/migration-guides/release-6.x/6.0/toolchain.rst b/docs/en/migration-guides/release-6.x/6.0/toolchain.rst index 9fcf794b3c..65fd8f781a 100644 --- a/docs/en/migration-guides/release-6.x/6.0/toolchain.rst +++ b/docs/en/migration-guides/release-6.x/6.0/toolchain.rst @@ -107,3 +107,64 @@ The header ```` is no longer available in Picolibc. To ensure comp #include /* fatal error: sys/signal.h: No such file or directory */ #include /* Ok: standard and portable */ + +.. only:: CONFIG_ESP_ROM_HAS_SUBOPTIMAL_NEWLIB_ON_MISALIGNED_MEMORY + + RISC-V Chips and Misaligned Memory Access in LibC Functions + ----------------------------------------------------------- + + Espressif RISC-V chips can perform misaligned memory accesses with only a small + performance penalty compared to aligned accesses. + + Previously, LibC functions that operate on memory (such as copy or comparison + functions) were implemented using byte-by-byte operations when a non-word-aligned + pointer was passed. Now, these functions use word (4-byte) load/store operations + whenever possible, resulting in a significant performance increase. These optimized + implementations are enabled by default via :ref:`CONFIG_LIBC_OPTIMIZED_MISALIGNED_ACCESS`, + which reduces the application’s memory budget (IRAM) by approximately 800–1000 bytes. + + The table below shows benchmark results on the ESP32-C3 chip using 4096-byte buffers: + + .. list-table:: Benchmark Results + :header-rows: 1 + :widths: 20 20 20 20 + + * - Function + - Old (CPU cycles) + - Optimized (CPU cycles) + - Improvement (%) + * - memcpy + - 32873 + - 4200 + - 87.2 + * - memcmp + - 57436 + - 14722 + - 74.4 + * - memmove + - 49336 + - 9237 + - 81.3 + * - strcpy + - 28678 + - 16659 + - 41.9 + * - strcmp + - 36867 + - 11146 + - 69.8 + + .. note:: + The results above apply to misaligned memory operations. + Performance for aligned memory operations remains unchanged. + + Functions with Improved Performance + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + - ``memcpy`` + - ``memcmp`` + - ``memmove`` + - ``strcpy`` + - ``strncpy`` + - ``strcmp`` + - ``strncmp`` diff --git a/docs/zh_CN/api-guides/performance/speed.rst b/docs/zh_CN/api-guides/performance/speed.rst index 61ae1e0511..f4b5343f50 100644 --- a/docs/zh_CN/api-guides/performance/speed.rst +++ b/docs/zh_CN/api-guides/performance/speed.rst @@ -87,7 +87,6 @@ :SOC_CPU_HAS_FPU: - 避免使用浮点运算 ``float``。尽管 {IDF_TARGET_NAME} 具备单精度浮点运算器,但是浮点运算总是慢于整数运算。因此可以考虑使用不同的整数表示方法进行运算,如定点表示法,或者将部分计算用整数运算后再切换为浮点运算。 :not SOC_CPU_HAS_FPU: - 避免使用浮点运算 ``float``。{IDF_TARGET_NAME} 通过软件模拟进行浮点运算,因此速度非常慢。可以考虑使用不同的整数表示方法进行运算,如定点表示法,或者将部分计算用整数运算后再切换为浮点运算。 - 避免使用双精度浮点运算 ``double``。{IDF_TARGET_NAME} 通过软件模拟进行双精度浮点运算,因此速度非常慢。可以考虑使用基于整数的表示方法或单精度浮点数。 - :CONFIG_ESP_ROM_HAS_SUBOPTIMAL_NEWLIB_ON_MISALIGNED_MEMORY: - 在性能要求较高的代码段中,应避免执行未对齐的 4 字节内存访问。为提升性能,可以考虑启用 :ref:`CONFIG_LIBC_OPTIMIZED_MISALIGNED_ACCESS`。启用此选项将额外占用约 190 字节的 IRAM 和 870 字节的 flash 存储。请注意,正确对齐的内存操作始终能够以全速执行,且不会产生性能损耗。 .. only:: esp32s2 or esp32s3 or esp32p4