Add .to_pil, etc methods to ScreenShot

This is based on a conversation started in #476 .

The proposed methods will let us provide users with easy-to-use best-practice ways to get the data to the desired destination.

The CPU could also have methods to send data to PyTorch and TensorFlow, since those can work with both CPU-resident and CUDA-resident data.  For instance, the cat detector demo uses PyTorch, giving it the MSS pixel data from the CPU, and then either running the neural net on the CPU, or moving the data back to the GPU and running the neural net there.

Here's something to consider in the API design: different libraries have different conventions for how to represent the data.  For instance, even though they all use NumPy, OpenCV wants the channels in BGR order, while Matplotlib, scikit-image, and most others want the channels in RGB order.  For PIL, it's not a problem: we just return an Image, and PIL knows its internal layout.  But for things like `.to_numpy`, it would be good to provide the user with the knobs they'll want to interface with their favorite image processing library.  These knobs aren't hard to implement, and handling them when we're preparing the array is more efficient than the user having to reorder it later.

Here’s a cheat sheet I got from ChatGPT about the expectations of popular image frameworks.  I would verify this before relying on it, but this gives us an idea of what users may need.  (For a head start on verification: its unedited response, with links to references, are at <https://chatgpt.com/s/t_6996cd0b606c8191aba9c2cc6e23dc3f>.)

Just to define some terms:
- **Array type:** The array library the image wants to use (NumPy, PyTorch, etc)
- **Layout:** `HWC` means the axes are [height, width, channels], in that order.  `HWC` is what we provide with `.pixels()`, for instance.  But some very popular frameworks want `CHW`.  (This terminology also often has another dimension if you’re processing N images in a batch, so you’ll often see `NHWC`, but that’s not relevant for MSS.)
- **Channel order:** RGB or BGR, typically.
- **dtype:** The data type that the framework wants for its inputs.  This is almost always either `uint8` if the pixels are 0–255 (like MSS uses) or some type of float pixel in 0–1 (very common for neural nets).

Here’s the list:

| Framework     | Array type | Layout | Channels | dtype  |
| ------------- | ---------- | ------ | -------- | ------ |
| OpenCV        | NumPy      | HWC    | BGR      | uint8  |
| scikit-image  | NumPy      | HWC    | RGB      | either |
| Matplotlib    | NumPy      | HWC    | RGB      | either |
| Torchvision   | PyTorch    | CHW    | RGB      | float  |
| Kornia        | PyTorch    | CHW    | RGB      | float  |
| TensorFlow    | TensorFlow | HWC    | RGB      | float  |
| JAX           | jax        | HWC    | RGB      | float  |

These different needs can easily be accommodated with parameters in to_numpy, to_pytorch, and any other similar methods we create to build array-like objects.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add .to_pil, etc methods to ScreenShot #478

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Framework	Array type	Layout	Channels	dtype
OpenCV	NumPy	HWC	BGR	uint8
scikit-image	NumPy	HWC	RGB	either
Matplotlib	NumPy	HWC	RGB	either
Torchvision	PyTorch	CHW	RGB	float
Kornia	PyTorch	CHW	RGB	float
TensorFlow	TensorFlow	HWC	RGB	float
JAX	jax	HWC	RGB	float

Uh oh!

Add .to_pil, etc methods to ScreenShot #478

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions