diff --git a/THREADING.md b/THREADING.md
new file mode 100644
index 0000000..01982fe
--- /dev/null
+++ b/THREADING.md
@@ -0,0 +1,92 @@
+# How to think about Threads in FastLED with ESP-IDF
+
+I believe I understand how this is supposed to work.
+
+## Parallel output, or multiple controller, mode
+
+FastLED talks a bit about what they call 'multi channel'. This is
+the very cool feature which allows parallel channels to all be blasted
+at the same time, because you have hardware support.
+
+The ESP32 has the RMT driver, which is a bit of hardware assist
+which is used to blast out the waveform from a very small interrupt handler.
+
+The examples in the Wiki don't match the way the code seems to be written
+for this platform --- and it's a little weird.
+
+For this platform, when you do a showLeds(), it keeps track of the number
+of calls, and when you reach the same number of calls as the number of allocated
+controllers, it actually spins up all of them together.
+
+To say that's "peculiar" stretches the point. 
+
+You'd prefer an interface called something like "flushLeds" which
+acts on all the known controllers, or takes the list of controllers
+you'd like to operate over.
+
+I think this API is supposed to work that you bang all the pixel arrays,
+then call showLeds() which hits all the controllers in turn, the last
+of which actually kicks off the transfers for all the controllers.
+
+## using the individual showLeds() functions
+
+If you set up individual showLeds() functions on the different controller, the
+code looks like the early ones will be very quick, and the final controller will 
+do all the transfers.
+
+I'd have to do some timing work and see if that's really happening --- but I'm at least 50% certain.
+
+## Protecting the LED array, or using the external RMT driver
+
+The API contract of showLEDs appears to be that showLeds blocks while
+bytes are being pushed, which means its safe to molest the led array after
+showLeds() finishes, thus setting you up for the next call.
+
+Which is well and good, and allows you to do things like have a semaphore or 
+signal that fires off when the frame is blasted.
+
+There are two peculiarities with this concept.
+
+If you are using the EXTERNAL_RMT, then each controller allocates a buffer 3x larger
+than the number of pixels. It does a convert() on the pixel array into the big memory
+buffer, which is the RMT buffer of signal transitions, and then doesn't molest your
+pixel buffer.
+
+While this a lot of memory, you've just achieved double-buffering. The actual pixel array
+is clear to be modified right after that 'build' happens, and it would be super nice to 
+know if that was true.
+
+If you use the internal system, then the conversion happens in the interrupt routine.
+The amount of work is the same ( roughly ), but most of it is happening in the interrupt
+instead of out in "regular" time. Depending on your use, it might be better
+to have the work going on in regular time as part of the double-buffer.
+
+While you are sure that your Pixel arrays are safe after showPixels finishes, you've
+covered a lot of time in the middle.
+
+## How much CPU is really used?
+
+If you 'do the math' on the number of pixels you can support, and you have 8 channels ( the number of RMT channels ),
+you'll think you don't have much time to actually update your models. But, in fact, you do,
+because the CPU is taking a fraction of its time feeding the RMT buffer, and a majority
+of its time hanging on the semaphore waiting to be done.
+
+The right thing to do, then, is to create another task to do whatever set of interpolation
+you'd like to do.
+
+If you measure the amount of time spent waiting on the Semaphore you'd be part of the way there,
+but you should also measure the amount of CPU spent in the IRQ handler. Minus those out, 
+and you'd learn how much CPU you've really got left.
+
+## Rumination on double buffering
+
+The use of the RMT buffer for external systems means you'd like to have some kind of control interface - like
+a semaphore - on each individual pixel buffer. That would allow a higher level task to simply bang
+away on the array and get held off when it was unsafe.
+
+In the case of using the Internal RMT, the amount of time is almost the same as ShowPixels, so you
+could easily have showPixels signal outside code through the mechanism of your choice.
+
+I'm not aware, at the moment, of how you would measure the amount of CPU spent in the IRQ system,
+which is the bulk of the time.
+