Description of how I think the threading is working in this module

2025-09-26 15:11:36 +02:00 · 2020-03-01 14:11:39 -08:00
parent e23eafd049
commit 4792d6cbe8
1 changed files with 92 additions and 0 deletions
--- a/THREADING.md
+++ b/THREADING.md
@@ -0,0 +1,92 @@
 # How to think about Threads in FastLED with ESP-IDF
 I believe I understand how this is supposed to work.
 ## Parallel output, or multiple controller, mode
 FastLED talks a bit about what they call 'multi channel'. This is
 the very cool feature which allows parallel channels to all be blasted
 at the same time, because you have hardware support.
 The ESP32 has the RMT driver, which is a bit of hardware assist
 which is used to blast out the waveform from a very small interrupt handler.
 The examples in the Wiki don't match the way the code seems to be written
 for this platform --- and it's a little weird.
 For this platform, when you do a showLeds(), it keeps track of the number
 of calls, and when you reach the same number of calls as the number of allocated
 controllers, it actually spins up all of them together.
 To say that's "peculiar" stretches the point. 
 You'd prefer an interface called something like "flushLeds" which
 acts on all the known controllers, or takes the list of controllers
 you'd like to operate over.
 I think this API is supposed to work that you bang all the pixel arrays,
 then call showLeds() which hits all the controllers in turn, the last
 of which actually kicks off the transfers for all the controllers.
 ## using the individual showLeds() functions
 If you set up individual showLeds() functions on the different controller, the
 code looks like the early ones will be very quick, and the final controller will 
 do all the transfers.
 I'd have to do some timing work and see if that's really happening --- but I'm at least 50% certain.
 ## Protecting the LED array, or using the external RMT driver
 The API contract of showLEDs appears to be that showLeds blocks while
 bytes are being pushed, which means its safe to molest the led array after
 showLeds() finishes, thus setting you up for the next call.
 Which is well and good, and allows you to do things like have a semaphore or 
 signal that fires off when the frame is blasted.
 There are two peculiarities with this concept.
 If you are using the EXTERNAL_RMT, then each controller allocates a buffer 3x larger
 than the number of pixels. It does a convert() on the pixel array into the big memory
 buffer, which is the RMT buffer of signal transitions, and then doesn't molest your
 pixel buffer.
 While this a lot of memory, you've just achieved double-buffering. The actual pixel array
 is clear to be modified right after that 'build' happens, and it would be super nice to 
 know if that was true.
 If you use the internal system, then the conversion happens in the interrupt routine.
 The amount of work is the same ( roughly ), but most of it is happening in the interrupt
 instead of out in "regular" time. Depending on your use, it might be better
 to have the work going on in regular time as part of the double-buffer.
 While you are sure that your Pixel arrays are safe after showPixels finishes, you've
 covered a lot of time in the middle.
 ## How much CPU is really used?
 If you 'do the math' on the number of pixels you can support, and you have 8 channels ( the number of RMT channels ),
 you'll think you don't have much time to actually update your models. But, in fact, you do,
 because the CPU is taking a fraction of its time feeding the RMT buffer, and a majority
 of its time hanging on the semaphore waiting to be done.
 The right thing to do, then, is to create another task to do whatever set of interpolation
 you'd like to do.
 If you measure the amount of time spent waiting on the Semaphore you'd be part of the way there,
 but you should also measure the amount of CPU spent in the IRQ handler. Minus those out, 
 and you'd learn how much CPU you've really got left.
 ## Rumination on double buffering
 The use of the RMT buffer for external systems means you'd like to have some kind of control interface - like
 a semaphore - on each individual pixel buffer. That would allow a higher level task to simply bang
 away on the array and get held off when it was unsafe.
 In the case of using the Internal RMT, the amount of time is almost the same as ShowPixels, so you
 could easily have showPixels signal outside code through the mechanism of your choice.
 I'm not aware, at the moment, of how you would measure the amount of CPU spent in the IRQ system,
 which is the bulk of the time.