change(newlib): enable LIBC_OPTIMIZED_MISALIGNED_ACCESS by default

This commit is contained in:
Alexey Lapshin
2025-08-17 11:52:54 +07:00
committed by BOT
parent 913d38ba14
commit b266d829dd
5 changed files with 63 additions and 3 deletions

View File

@@ -145,7 +145,7 @@ menu "LibC"
config LIBC_OPTIMIZED_MISALIGNED_ACCESS
bool "Use performance-optimized memXXX/strXXX functions on misaligned memory access"
default n
default y
depends on ESP_ROM_HAS_SUBOPTIMAL_NEWLIB_ON_MISALIGNED_MEMORY
help
Enables performance-optimized implementations of memory and string functions

View File

@@ -194,6 +194,7 @@ The following options will reduce IRAM usage of some ESP-IDF features:
:SOC_GPSPI_SUPPORTED: - Enable :ref:`CONFIG_HEAP_PLACE_FUNCTION_INTO_FLASH`. Provided that :ref:`CONFIG_SPI_MASTER_ISR_IN_IRAM` is not enabled and the heap functions are not incorrectly used from ISRs, this option is safe to enable in all configurations.
:esp32c2: - Enable :ref:`CONFIG_BT_RELEASE_IRAM`. Release BT text section and merge BT data, bss & text into a large free heap region when ``esp_bt_mem_release`` is called. This makes Bluetooth unavailable until the next restart, but saving ~22 KB or more of IRAM.
- Disable :ref:`CONFIG_LIBC_LOCKS_PLACE_IN_IRAM` if no ISRs that run while cache is disabled (i.e. IRAM ISRs) use libc lock APIs.
:CONFIG_ESP_ROM_HAS_SUBOPTIMAL_NEWLIB_ON_MISALIGNED_MEMORY: - Disable :ref:`CONFIG_LIBC_OPTIMIZED_MISALIGNED_ACCESS` to save approximately 1000 bytes of IRAM, at the cost of reduced performance.
.. only:: esp32

View File

@@ -87,7 +87,6 @@ The following optimizations improve the execution of nearly all code, including
:SOC_CPU_HAS_FPU: - Avoid using floating point arithmetic ``float``. Even though {IDF_TARGET_NAME} has a single precision hardware floating point unit, floating point calculations are always slower than integer calculations. If possible then use fixed point representations, a different method of integer representation, or convert part of the calculation to be integer only before switching to floating point.
:not SOC_CPU_HAS_FPU: - Avoid using floating point arithmetic ``float``. On {IDF_TARGET_NAME} these calculations are emulated in software and are very slow. If possible, use fixed point representations, a different method of integer representation, or convert part of the calculation to be integer only before switching to floating point.
- Avoid using double precision floating point arithmetic ``double``. These calculations are emulated in software and are very slow. If possible then use an integer-based representation, or single-precision floating point.
:CONFIG_ESP_ROM_HAS_SUBOPTIMAL_NEWLIB_ON_MISALIGNED_MEMORY: - Avoid misaligned 4-byte memory accesses in performance-critical code sections. For potential performance improvements, consider enabling :ref:`CONFIG_LIBC_OPTIMIZED_MISALIGNED_ACCESS`, which requires approximately 190 bytes of IRAM and 870 bytes of flash memory. Note that properly aligned memory operations will always execute at full speed without performance penalties.
.. only:: esp32s2 or esp32s3 or esp32p4

View File

@@ -107,3 +107,64 @@ The header ``<sys/signal.h>`` is no longer available in Picolibc. To ensure comp
#include <sys/signal.h> /* fatal error: sys/signal.h: No such file or directory */
#include <signal.h> /* Ok: standard and portable */
.. only:: CONFIG_ESP_ROM_HAS_SUBOPTIMAL_NEWLIB_ON_MISALIGNED_MEMORY
RISC-V Chips and Misaligned Memory Access in LibC Functions
-----------------------------------------------------------
Espressif RISC-V chips can perform misaligned memory accesses with only a small
performance penalty compared to aligned accesses.
Previously, LibC functions that operate on memory (such as copy or comparison
functions) were implemented using byte-by-byte operations when a non-word-aligned
pointer was passed. Now, these functions use word (4-byte) load/store operations
whenever possible, resulting in a significant performance increase. These optimized
implementations are enabled by default via :ref:`CONFIG_LIBC_OPTIMIZED_MISALIGNED_ACCESS`,
which reduces the applications memory budget (IRAM) by approximately 8001000 bytes.
The table below shows benchmark results on the ESP32-C3 chip using 4096-byte buffers:
.. list-table:: Benchmark Results
:header-rows: 1
:widths: 20 20 20 20
* - Function
- Old (CPU cycles)
- Optimized (CPU cycles)
- Improvement (%)
* - memcpy
- 32873
- 4200
- 87.2
* - memcmp
- 57436
- 14722
- 74.4
* - memmove
- 49336
- 9237
- 81.3
* - strcpy
- 28678
- 16659
- 41.9
* - strcmp
- 36867
- 11146
- 69.8
.. note::
The results above apply to misaligned memory operations.
Performance for aligned memory operations remains unchanged.
Functions with Improved Performance
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- ``memcpy``
- ``memcmp``
- ``memmove``
- ``strcpy``
- ``strncpy``
- ``strcmp``
- ``strncmp``

View File

@@ -87,7 +87,6 @@
:SOC_CPU_HAS_FPU: - 避免使用浮点运算 ``float``。尽管 {IDF_TARGET_NAME} 具备单精度浮点运算器,但是浮点运算总是慢于整数运算。因此可以考虑使用不同的整数表示方法进行运算,如定点表示法,或者将部分计算用整数运算后再切换为浮点运算。
:not SOC_CPU_HAS_FPU: - 避免使用浮点运算 ``float``。{IDF_TARGET_NAME} 通过软件模拟进行浮点运算,因此速度非常慢。可以考虑使用不同的整数表示方法进行运算,如定点表示法,或者将部分计算用整数运算后再切换为浮点运算。
- 避免使用双精度浮点运算 ``double``。{IDF_TARGET_NAME} 通过软件模拟进行双精度浮点运算,因此速度非常慢。可以考虑使用基于整数的表示方法或单精度浮点数。
:CONFIG_ESP_ROM_HAS_SUBOPTIMAL_NEWLIB_ON_MISALIGNED_MEMORY: - 在性能要求较高的代码段中,应避免执行未对齐的 4 字节内存访问。为提升性能,可以考虑启用 :ref:`CONFIG_LIBC_OPTIMIZED_MISALIGNED_ACCESS`。启用此选项将额外占用约 190 字节的 IRAM 和 870 字节的 flash 存储。请注意,正确对齐的内存操作始终能够以全速执行,且不会产生性能损耗。
.. only:: esp32s2 or esp32s3 or esp32p4