Data Curator
Data Acquisition, Simplified.
Description
Component library for downloading, validating, homogenizing, and combining financial data from different data providers. Can be run in standalone mode, configurable in Excel, or as a component of a larger Python-based system.
Features
-
Configurable from an Excel file, or directly in a Python script. Docker image also available.
-
Fully readable and specific tag names, homogenized between data providers, based on the US GAAP taxonomy. Switch between data providers without changing your code.
-
Automatically validates market and fundamental data, discarding datasets that make no sense (like high price below low, etc.) or can't guarantee point-in-time validity (like amended statements).
-
Easily create your own calculated feature functions without need for Numpy or Pandas (though you can also use those if you want to).
-
Output to CSV or Parquet files, or to in-memory Pandas Dataframes for further processing.
-
Completely extensible architecture: implement your own data providers, feature combinations, and output handlers on top of clear, stable interfaces.
-
Readable, well-documented, and tested code.
Supported Data Providers
-
Financial Modeling Prep (free and discounted plans available through our referral link)
-
Yahoo Finance (requires installing a separate extension package, and doesn't support most data types)
