Data Curator
Data Acquisition, Simplified.
Description
Open-source library for downloading, validating, homogenizing, and combining financial data from different data providers. Can be run standalone or as a component of a larger Python-based system. Configurable via Excel or directly in code. Docker image also available.
Features
-
Use your favorite IDE to enhance with AI assistants.
-
Configurable from an Excel file, or directly in a Python script. Docker image also available.
-
Fully readable and specific tag names, homogenized between data providers, based on the US GAAP taxonomy. Switch between data providers without changing your code.
-
Automatically validates market and fundamental data, discarding datasets that make no sense (like high price below low, etc.) or can't guarantee point-in-time validity (like amended statements).
-
Easily create your own calculated feature functions without need for Numpy or Pandas (though you can also use those if you want to).
-
Output to CSV or Parquet files, or to in-memory Pandas Dataframes for further processing.
-
Completely extensible architecture: implement your own data providers, feature combinations, and output handlers on top of clear, stable interfaces.
-
Readable, well-documented, and tested code.
Supported Data Providers
-
LSEG Workspace (https://www.lseg.com/en/data-analytics/products/workspace)
-
Financial Modeling Prep (free and discounted plans available through our referral link: https://site.financialmodelingprep.com/pricing-plans?couponCode=xss2L2sI)
-
Yahoo! Finance (requires installing a separate extension package and doesn't support most data types: https://github.com/KaxaNuk/Data-Curator-Extensions_Yahoo-Finance)

