From fd26e2e74b01a264e05ca72e80b98f484ba9a6c3 Mon Sep 17 00:00:00 2001 From: Soledad Galli Date: Fri, 3 Jul 2026 09:43:35 +0200 Subject: [PATCH 1/2] Add sprint proposal for Feature-engine modernization This document outlines the proposal for a sprint focused on modernizing the Feature-engine API to improve flexibility and performance while maintaining backward compatibility. --- .../sprints/feature-engine-sprint-proposal.md | 39 +++++++++++++++++++ 1 file changed, 39 insertions(+) create mode 100644 src/content/sprints/feature-engine-sprint-proposal.md diff --git a/src/content/sprints/feature-engine-sprint-proposal.md b/src/content/sprints/feature-engine-sprint-proposal.md new file mode 100644 index 000000000..172184674 --- /dev/null +++ b/src/content/sprints/feature-engine-sprint-proposal.md @@ -0,0 +1,39 @@ +--- +title: "Feature-engine" +numberOfPeople: "10" # How many people you expect to be able to accommodate. +pythonLevel: "Any" # Any, Beginner, Intermediate, or Advanced. +contactPerson: # The main person to reach out to regarding the sprint. + name: "Soledad Galli" + email: "solegalli@protonmail.com" + github: "solegalli" + twitter: "Soledad_Galli" +links: # Add as many links as relevant. + - title: "Feature-engine's GitHub repo" + url: "https://github.com/feature-engine/feature_engine" +--- + +# Modernising Feature-engine: Evolving the API for a Changing Python Ecosystem + +## Introduction + +Feature-engine was originally designed to bridge the gap between pandas and scikit-learn at a time when scikit-learn only accepted NumPy arrays. Its design decisions were intended to guide users towards good practices by raising errors when transformers were used in inappropriate contexts. + +The Python machine learning ecosystem has evolved considerably since then. Scikit-learn now supports pandas natively, Polars has become a popular dataframe library, and users users have requested greater flexibility from Feature-engine transformers. To ensure the project continues to meet the needs of its users, its API needs to evolve while preserving backward compatibility as much as possible. + +During this sprint, we will work on implementing the next generation of Feature-engine's API, making it more flexible, faster, and compatible with modern data processing libraries, while maintaining backward compatibility wherever possible. + +## Sprint Activities + +Participants will work together on the following improvements: + +- **Modernise the dataframe API** by adding support for Polars while allowing users to choose the installation that best suits their needs: a pandas-only installation, a Polars-only installation, or support for both backends. + +- **Extend flexible error handling across the library.** The variable selection utilities already allow users to choose between raising an error or continuing when conditions are not met. During the sprint, we will extend this behaviour to the transformers so that users can opt for stricter or more permissive workflows. + +- **Introduce configurable behaviour for transformation-specific errors.** For example, transformers that apply logarithmic transformations currently fail when negative values are present. We will extend the API to provide suitable options that allow the transformation to continue when appropriate, instead of raising an exception. + +- **Improve performance by modernising internal implementations.** Where possible, we will replace pandas operations with equivalent NumPy implementations to improve execution speed while preserving the existing public API and maintaining backward compatibility as much as possible. + +- **Write tests and documentation** for the new functionality to ensure reliability and provide clear guidance for users adopting the updated API. + +The sprint is suitable for contributors interested in machine learning libraries, API design, testing, documentation, and open source development. From 44a1d65bfb52335a9e869bde7111ac63d3f0cac5 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Fri, 3 Jul 2026 07:48:29 +0000 Subject: [PATCH 2/2] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- .../sprints/feature-engine-sprint-proposal.md | 44 +++++++++++++++---- 1 file changed, 35 insertions(+), 9 deletions(-) diff --git a/src/content/sprints/feature-engine-sprint-proposal.md b/src/content/sprints/feature-engine-sprint-proposal.md index 172184674..0867c2b26 100644 --- a/src/content/sprints/feature-engine-sprint-proposal.md +++ b/src/content/sprints/feature-engine-sprint-proposal.md @@ -16,24 +16,50 @@ links: # Add as many links as relevant. ## Introduction -Feature-engine was originally designed to bridge the gap between pandas and scikit-learn at a time when scikit-learn only accepted NumPy arrays. Its design decisions were intended to guide users towards good practices by raising errors when transformers were used in inappropriate contexts. +Feature-engine was originally designed to bridge the gap between pandas and +scikit-learn at a time when scikit-learn only accepted NumPy arrays. Its design +decisions were intended to guide users towards good practices by raising errors +when transformers were used in inappropriate contexts. -The Python machine learning ecosystem has evolved considerably since then. Scikit-learn now supports pandas natively, Polars has become a popular dataframe library, and users users have requested greater flexibility from Feature-engine transformers. To ensure the project continues to meet the needs of its users, its API needs to evolve while preserving backward compatibility as much as possible. +The Python machine learning ecosystem has evolved considerably since then. +Scikit-learn now supports pandas natively, Polars has become a popular dataframe +library, and users users have requested greater flexibility from Feature-engine +transformers. To ensure the project continues to meet the needs of its users, +its API needs to evolve while preserving backward compatibility as much as +possible. -During this sprint, we will work on implementing the next generation of Feature-engine's API, making it more flexible, faster, and compatible with modern data processing libraries, while maintaining backward compatibility wherever possible. +During this sprint, we will work on implementing the next generation of +Feature-engine's API, making it more flexible, faster, and compatible with +modern data processing libraries, while maintaining backward compatibility +wherever possible. ## Sprint Activities Participants will work together on the following improvements: -- **Modernise the dataframe API** by adding support for Polars while allowing users to choose the installation that best suits their needs: a pandas-only installation, a Polars-only installation, or support for both backends. +- **Modernise the dataframe API** by adding support for Polars while allowing + users to choose the installation that best suits their needs: a pandas-only + installation, a Polars-only installation, or support for both backends. -- **Extend flexible error handling across the library.** The variable selection utilities already allow users to choose between raising an error or continuing when conditions are not met. During the sprint, we will extend this behaviour to the transformers so that users can opt for stricter or more permissive workflows. +- **Extend flexible error handling across the library.** The variable selection + utilities already allow users to choose between raising an error or continuing + when conditions are not met. During the sprint, we will extend this behaviour + to the transformers so that users can opt for stricter or more permissive + workflows. -- **Introduce configurable behaviour for transformation-specific errors.** For example, transformers that apply logarithmic transformations currently fail when negative values are present. We will extend the API to provide suitable options that allow the transformation to continue when appropriate, instead of raising an exception. +- **Introduce configurable behaviour for transformation-specific errors.** For + example, transformers that apply logarithmic transformations currently fail + when negative values are present. We will extend the API to provide suitable + options that allow the transformation to continue when appropriate, instead of + raising an exception. -- **Improve performance by modernising internal implementations.** Where possible, we will replace pandas operations with equivalent NumPy implementations to improve execution speed while preserving the existing public API and maintaining backward compatibility as much as possible. +- **Improve performance by modernising internal implementations.** Where + possible, we will replace pandas operations with equivalent NumPy + implementations to improve execution speed while preserving the existing + public API and maintaining backward compatibility as much as possible. -- **Write tests and documentation** for the new functionality to ensure reliability and provide clear guidance for users adopting the updated API. +- **Write tests and documentation** for the new functionality to ensure + reliability and provide clear guidance for users adopting the updated API. -The sprint is suitable for contributors interested in machine learning libraries, API design, testing, documentation, and open source development. +The sprint is suitable for contributors interested in machine learning +libraries, API design, testing, documentation, and open source development.