top of page

Data Curator

Data Acquisition, Simplified.

Description

Open-source library for downloading, validating, homogenizing, and combining financial data from different data providers. Can be run standalone or as a component of a larger Python-based system. Configurable via Excel or directly in code. Docker image also available.

Features

  • Use your favorite IDE to enhance with AI assistants.

  • Configurable from an Excel file, or directly in a Python script. Docker image also available.

  • Fully readable and specific tag names, homogenized between data providers, based on the US GAAP taxonomy. Switch between data providers without changing your code.

  • Automatically validates market and fundamental data, discarding datasets that make no sense (like high price below low, etc.) or can't guarantee point-in-time validity (like amended statements).

  • Easily create your own calculated feature functions without need for Numpy or Pandas (though you can also use those if you want to).

  • Output to CSV or Parquet files, or to in-memory Pandas Dataframes for further processing.

  • Completely extensible architecture: implement your own data providers, feature combinations, and output handlers on top of clear, stable interfaces.

  • Readable, well-documented, and tested code.

Supported Data Providers

Data_ Curator.png

Everything you need to get started!

bottom of page