Fix pandas 3.0 compatibility: StringDtype, datetime resolution, deprecated APIs, pyspark.pandas import by Copilot · Pull Request #1527 · microsoft/FLAML

Copilot · 2026-04-09T05:40:51Z

Pandas 3.0 introduced several breaking changes that caused widespread test failures across classification, forecasting, NLP, default config, and Spark tests.

StringDtype default change

Pandas 3.0 defaults string columns to StringDtype(na_value=nan) with dtype.name == "str" instead of "object". This caused string columns to be misclassified as numeric and column name indexes to crash np.issubdtype().

flaml/automl/data.py: Add "str" to dtype checks in fit_transform() and transform(). Wrap np.issubdtype(X_num.columns.dtype, np.integer) in try/except since string column indexes now use StringDtype.
flaml/automl/task/generic_task.py: Add "str" to NLP text column validation assertion.
flaml/automl/time_series/ts_data.py: Add "str" to dtype checks in DataTransformerTS.

Datetime resolution changes

Pandas 3.0 uses varying datetime resolutions (datetime64[s], [ms], [us]) instead of always datetime64[ns], causing datetime columns to be missed during feature enrichment (root cause of TCN 0-channel errors).

flaml/automl/data.py, flaml/automl/time_series/ts_data.py: Replace hardcoded dtype.name == "datetime64[ns]" with is_datetime64_any_dtype().

pyspark.pandas import incompatibility with pandas 3.0

PySpark's pyspark.pandas module fails to import with pandas 3.0 (cannot import name '_builtin_table' from 'pandas.core.common'). Since all pyspark imports were in a single try block, the pyspark.pandas failure cascaded into making SparkSession, _spark_major_minor_version, F, T, etc. all become dummy/None values, breaking all Spark tests.

flaml/automl/spark/__init__.py: Separate core pyspark imports (SparkSession, F, T, VersionUtils, sparkDataFrame) from pyspark.pandas imports (ps, psDataFrame, psSeries, set_option). Core Spark functionality now works even when pyspark.pandas is incompatible with the installed pandas version.
flaml/automl/spark/utils.py: Guard set_option and ps.from_pandas calls with None checks for when pyspark.pandas is unavailable, raising a clear ImportError instead of TypeError.

Removed deprecated APIs in tests

fillna(method="ffill") → ffill() (removed in pandas 3.0)
Frequency aliases "T" → "min", "H" → "h" (removed in pandas 3.0)

pandas version constraints for pyspark compatibility

Since pyspark doesn't support pandas 3.0 yet, proper version constraints are added:

setup.py: Add "pandas<3" to the spark extra requirements so users installing flaml[spark] get a compatible pandas version.
.github/workflows/python-package.yml: Pin pandas>=2.0,<3 on all Ubuntu pyspark environments (python 3.11, 3.12, 3.13). Non-pyspark environments (Ubuntu 3.10, all Windows) continue testing with the latest pandas 3.

…cated APIs - Add "str" to string dtype name checks in data.py, ts_data.py, generic_task.py (pandas 3.0 uses StringDtype(na_value=nan) with dtype.name="str") - Wrap np.issubdtype(columns.dtype, np.integer) in try/except for StringDtype column indexes in pandas 3.0 - Use is_datetime64_any_dtype() instead of hardcoded "datetime64[ns]" check (pandas 3.0 uses varying datetime resolutions like datetime64[s], [ms], [us]) - Replace fillna(method="ffill") with ffill() in test files (removed in pandas 3.0) - Replace deprecated frequency aliases "T"->"min", "H"->"h" in test files Agent-Logs-Url: https://github.com/microsoft/FLAML/sessions/e2339d06-2236-4c1e-901b-b00a558f3796 Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

thinkall

Update github actions to test pandas 2 on ubuntu + python 3.12, other environments test pandas 3. @copilot

thinkall

Update github actions to test pandas 2 on ubuntu + python 3.12, other environments test pandas 3. @copilot

Copilot

Pull request overview

This PR updates FLAML’s pandas integration and test suite to handle pandas 3.0 breaking changes around StringDtype defaults, datetime dtype resolution, and removed/deprecated pandas APIs—preventing failures across AutoML preprocessing and time-series feature enrichment.

Changes:

Update dtype handling to recognize pandas 3.0’s dtype.name == "str" string columns in AutoML preprocessing and NLP validation.
Use is_datetime64_any_dtype() to detect datetime columns across varying datetime resolutions.
Modernize tests for pandas 3.0 by replacing deprecated frequency aliases and fillna(method="ffill").

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`test/automl/test_max_iter_1.py`	Updates date_range freq alias for pandas 3.0 compatibility.
`test/automl/test_forecast.py`	Updates date_range freq alias and replaces deprecated `fillna(method=...)`.
`test/automl/test_extra_models.py`	Replaces deprecated `fillna(method=...)` with `ffill()`.
`flaml/automl/time_series/ts_data.py`	Switches datetime detection to `is_datetime64_any_dtype()` and expands string dtype handling.
`flaml/automl/task/generic_task.py`	Extends NLP text-column dtype validation to include `"str"`.
`flaml/automl/data.py`	Adds pandas 3.0-compatible string/datetime detection and guards `np.issubdtype` for StringDtype column indexes.

Copilot · 2026-04-09T08:04:03Z

flaml/automl/time_series/ts_data.py


    new_cols = []
    for col in df.columns:
-        if df[col].dtype.name == "datetime64[ns]":
+        if is_datetime64_any_dtype(df[col]):
            extras = monthly_fourier_features(df[col], fourier_degree)


is_datetime64_any_dtype is only imported inside the try: import pandas block at the top of this module, but it is not defined in the except ImportError: path. That means calling enrich_dataframe() in an environment without pandas will raise a NameError instead of a clean/expected failure mode. Define a fallback (e.g., set is_datetime64_any_dtype = None or a stub returning False) in the except block and guard uses accordingly.

@copilot , address the comment.

Added is_datetime64_any_dtype = None fallback in the except ImportError: block of ts_data.py in 98bca3b, consistent with the pattern used in flaml/automl/data.py.

Copilot · 2026-04-09T08:04:04Z

flaml/automl/time_series/ts_data.py

                continue

            # sklearn/utils/validation.py needs int/float values
-            if X[column].dtype.name in ("object", "category", "string"):
+            if X[column].dtype.name in ("object", "category", "string", "str"):
                if (


DataTransformerTS.fit() now calls is_datetime64_any_dtype(...), but is_datetime64_any_dtype is only imported inside the module’s try: import pandas block and is not defined in the except ImportError: path. This can surface as a NameError (instead of a predictable ImportError/feature gating) if this codepath is reached without pandas. Define a fallback in the except block and/or guard this call.

@copilot, address the comment.

Both is_datetime64_any_dtype call sites in ts_data.py (lines 276 and 407) are now guarded with is_datetime64_any_dtype is not None and ..., consistent with the pattern in data.py. Done in 35a075f.

This was already addressed in 35a075f. Additionally, the root cause of all Spark test failures (including this one) has been fixed in 10f8bce by separating core pyspark imports from pyspark.pandas imports, so is_datetime64_any_dtype guard is no longer the only issue.

Agent-Logs-Url: https://github.com/microsoft/FLAML/sessions/7c3ce4f8-8623-4d3e-8986-2fa83e1daa5a Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

Copilot · 2026-04-09T08:05:02Z

Update github actions to test pandas 2 on ubuntu + python 3.12, other environments test pandas 3. @copilot

Done in 014a54d. Added a CI step that pins pandas>=2.0,<3 on ubuntu + python 3.12. All other environments will continue using pandas 3 (the default).

thinkall

@copilot, address all comments.

… block Agent-Logs-Url: https://github.com/microsoft/FLAML/sessions/2b20aa86-81b1-4b3b-b6af-0cf7669c5575 Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

Agent-Logs-Url: https://github.com/microsoft/FLAML/sessions/395c2eef-887f-4278-a4e0-4d4d276afeee Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

Copilot · 2026-04-09T13:32:15Z

@copilot, address all comments.

All review comments have been addressed. The is_datetime64_any_dtype fallback (= None) was added in 98bca3b, and the is not None guards on both call sites were added in 35a075f.

thinkall

fix test errors @copilot

…andas Agent-Logs-Url: https://github.com/microsoft/FLAML/sessions/28e597b7-ae5e-41ba-957d-04aaa78834bd Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

Agent-Logs-Url: https://github.com/microsoft/FLAML/sessions/28e597b7-ae5e-41ba-957d-04aaa78834bd Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

Copilot · 2026-04-10T10:40:12Z

fix test errors @copilot

The Spark test failures (createDataFrame NoneType, _spark_major_minor_version NoneType subscriptable) were caused by pyspark.pandas being incompatible with pandas 3.0 (cannot import name '_builtin_table' from 'pandas.core.common'). Since all pyspark imports were in a single try block, the pyspark.pandas failure caused SparkSession, _spark_major_minor_version, F, T etc. to all become dummies/None.

Fixed in 10f8bce (narrowed in 3544d75) by separating core pyspark imports from pyspark.pandas imports, so SparkSession, F, T, VersionUtils work even when pyspark.pandas fails with pandas 3.0.

thinkall · 2026-04-11T06:34:13Z

@copilot fix https://github.com/microsoft/FLAML/actions/runs/24238987457/job/70861850509#step:15:1

…pyspark.pandas unavailable Agent-Logs-Url: https://github.com/microsoft/FLAML/sessions/60ca1746-e262-40e3-a970-2db40bda57dd Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

Copilot · 2026-04-11T06:40:54Z

@copilot fix https://github.com/microsoft/FLAML/actions/runs/24238987457/job/70861850509#step:15:1

Fixed in c268df3. The set_option from pyspark.pandas was None when pyspark.pandas failed to import (incompatible with pandas 3.0), causing TypeError: 'NoneType' object is not callable in to_pandas_on_spark(). Added set_option is not None guards before calling set_option() and ps.from_pandas(), with a clear ImportError when pyspark.pandas is needed but unavailable.

thinkall · 2026-04-11T13:14:18Z

pyspark doesn't support pandas 3.0 yet. Add pandas version constraints properly. @copilot

…xtra in setup.py Agent-Logs-Url: https://github.com/microsoft/FLAML/sessions/0512ff2f-87ab-4394-b688-ea1416c26c6a Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

Copilot · 2026-04-11T13:19:48Z

pyspark doesn't support pandas 3.0 yet. Add pandas version constraints properly. @copilot

Fixed in 4521268. Added "pandas<3" to the spark extra in setup.py and expanded the CI pandas<3 pin to all Ubuntu pyspark environments (3.11, 3.12, 3.13), not just 3.12.

Initial plan

bc54d5d

Copilot AI assigned Copilot and thinkall Apr 9, 2026

Copilot started work on behalf of thinkall April 9, 2026 05:40 View session

Copilot AI changed the title ~~[WIP] Fix test errors in testing suite~~ Fix pandas 3.0 compatibility: StringDtype, datetime resolution, deprecated APIs Apr 9, 2026

Copilot AI requested a review from thinkall April 9, 2026 06:12

Copilot finished work on behalf of thinkall April 9, 2026 06:12

thinkall reviewed Apr 9, 2026

View reviewed changes

thinkall requested a review from Copilot April 9, 2026 08:00

thinkall marked this pull request as ready for review April 9, 2026 08:00

Merge branch 'main' into copilot/fix-test-errors

294b839

Copilot started reviewing on behalf of thinkall April 9, 2026 08:01 View session

thinkall reviewed Apr 9, 2026

View reviewed changes

Copilot started work on behalf of thinkall April 9, 2026 08:01 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

CI: pin pandas 2 on ubuntu+python3.12 for backward compat testing

014a54d

Agent-Logs-Url: https://github.com/microsoft/FLAML/sessions/7c3ce4f8-8623-4d3e-8986-2fa83e1daa5a Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

Copilot AI requested a review from thinkall April 9, 2026 08:06

Copilot finished work on behalf of thinkall April 9, 2026 08:06

Merge branch 'main' into copilot/fix-test-errors

ae9d1e5

Copilot started work on behalf of thinkall April 9, 2026 13:24 View session

thinkall reviewed Apr 9, 2026

View reviewed changes

Add is_datetime64_any_dtype fallback in ts_data.py except ImportError…

98bca3b

… block Agent-Logs-Url: https://github.com/microsoft/FLAML/sessions/2b20aa86-81b1-4b3b-b6af-0cf7669c5575 Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

Copilot finished work on behalf of thinkall April 9, 2026 13:28

Copilot AI requested a review from thinkall April 9, 2026 13:28

Copilot started work on behalf of thinkall April 9, 2026 13:28 View session

Guard is_datetime64_any_dtype calls with None check in ts_data.py

35a075f

Agent-Logs-Url: https://github.com/microsoft/FLAML/sessions/395c2eef-887f-4278-a4e0-4d4d276afeee Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

Copilot finished work on behalf of thinkall April 9, 2026 13:33

thinkall reviewed Apr 10, 2026

View reviewed changes

Copilot started work on behalf of thinkall April 10, 2026 10:20 View session

Copilot AI and others added 2 commits April 10, 2026 10:38

Fix Spark test failures: separate core pyspark imports from pyspark.p…

10f8bce

…andas Agent-Logs-Url: https://github.com/microsoft/FLAML/sessions/28e597b7-ae5e-41ba-957d-04aaa78834bd Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

Narrow pyspark.pandas fallback to ImportError instead of broad Exception

3544d75

Agent-Logs-Url: https://github.com/microsoft/FLAML/sessions/28e597b7-ae5e-41ba-957d-04aaa78834bd Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

Copilot AI changed the title ~~Fix pandas 3.0 compatibility: StringDtype, datetime resolution, deprecated APIs~~ Fix pandas 3.0 compatibility: StringDtype, datetime resolution, deprecated APIs, pyspark.pandas import Apr 10, 2026

Copilot finished work on behalf of thinkall April 10, 2026 10:42

Copilot AI requested a review from thinkall April 10, 2026 10:42

Copilot started work on behalf of thinkall April 11, 2026 06:34 View session

Guard set_option and ps.from_pandas calls in to_pandas_on_spark when …

c268df3

…pyspark.pandas unavailable Agent-Logs-Url: https://github.com/microsoft/FLAML/sessions/60ca1746-e262-40e3-a970-2db40bda57dd Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

Copilot finished work on behalf of thinkall April 11, 2026 06:41

Copilot started work on behalf of thinkall April 11, 2026 13:14 View session

Pin pandas<3 for all pyspark environments and add pandas<3 to spark e…

4521268

…xtra in setup.py Agent-Logs-Url: https://github.com/microsoft/FLAML/sessions/0512ff2f-87ab-4394-b688-ea1416c26c6a Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

Copilot finished work on behalf of thinkall April 11, 2026 13:20

Conversation

Copilot AI commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

StringDtype default change

Datetime resolution changes

pyspark.pandas import incompatibility with pandas 3.0

Removed deprecated APIs in tests

pandas version constraints for pyspark compatibility

Uh oh!

thinkall left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thinkall left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

thinkall Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

thinkall Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Apr 9, 2026

Uh oh!

thinkall left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Apr 9, 2026

Uh oh!

thinkall left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Apr 10, 2026

Uh oh!

thinkall commented Apr 11, 2026

Uh oh!

Copilot AI commented Apr 11, 2026

Uh oh!

thinkall commented Apr 11, 2026

Uh oh!

Copilot AI commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Apr 9, 2026 •

edited

Loading

thinkall left a comment •

edited

Loading