Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 20 additions & 11 deletions docs/api/plotting.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,17 +74,26 @@ chart # You can now access chart.value to get the selected data

Altair has a concept of [data](https://altair-viz.github.io/user_guide/data_transformers.html) transformers, which can be used to improve performance.

Such examples are:

- pandas Dataframe has to be sanitized and serialized to JSON.
- The rows of a Dataframe might need to be sampled or limited to a maximum number.
- The Dataframe might be written to a `.csv` or `.json` file for performance reasons.

By default, Altair uses the `default` data transformer, which is the slowest in marimo. It is limited to 5000 rows (although we increase this to `20_000` rows as marimo can handle this). This includes the data inside the HTML that is being sent over the network, which can also be limited by marimo's maximum message size.

It is recommended to use the `marimo_csv` data transformer, which is the most performant and can handle the largest datasets: it converts the data to a CSV file which is smaller and can be sent over the network. This can handle up to +400,000 rows with no issues.

When using `mo.ui.altair_chart`, we automatically set the data transformer to `marimo_csv` for you. If you are using Altair directly, you can set the data transformer using the following code:
Some examples are:

- pandas Dataframe has to be sanitized and serialized to JSON;
- the rows of a Dataframe might need to be sampled or limited to a maximum number;
- the Dataframe might be written to a `.csv` or `.json` file for performance reasons.

By default, Altair uses the `default` data transformer, which is the slowest in
marimo. It is limited to 5000 rows (although we increase this to `20_000` rows
as marimo can handle this). This includes the data inside the HTML that is
being sent over the network, which can also be limited by marimo's maximum
message size.

It is recommended to use the `marimo_csv` data transformer, which is the most
performant and can handle the largest datasets: it converts the data to a CSV
file which is smaller and can be sent over the network. This can handle up to
+400,000 rows with no issues.

When using `mo.ui.altair_chart`, we automatically set the data transformer to
`marimo_csv` for you. If you are using Altair directly, you can set the data
transformer using the following code:

```python
import altair as alt
Expand Down
35 changes: 33 additions & 2 deletions docs/guides/working_with_data/plotting.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ points inside the selection. When nothing is selected, `fig.value` is falsy and

#### Example

<!-- TODO add a WASM version when shipped. -->
/// tab | code

```python
import matplotlib.pyplot as plt
Expand All @@ -62,6 +62,37 @@ mask = ax.value.get_mask(x, y)
selected_x, selected_y = x[mask], y[mask]
```

///

/// tab | live example

/// marimo-embed
size: large

```python
@app.cell
def __():
import matplotlib.pyplot as plt
import numpy as np

x = np.random.randn(500)
y = np.random.randn(500)
plt.scatter(x, y)
ax = mo.ui.matplotlib(plt.gca())
ax
return

@app.cell
def __():
mask = ax.value.get_mask(x, y)
np.column_stack([x[mask], y[mask]])
return
```

///

///

#### Debouncing

By default, the selection streams to Python as you drag. For expensive
Expand Down Expand Up @@ -284,7 +315,7 @@ conda install -c conda-forge "vegafusion-python-embed>=1.4.0" "vegafusion>=1.4.0
@app.cell(hide_code=True)
async def __():
import micropip
await micropip.install("plotly[express]")
await micropip.install(["plotly[express]", "pandas"])
import plotly.express as px
return px,

Expand Down