
- Airflow docker requirements.txt install#
- Airflow docker requirements.txt code#
To ensure OpenLineage logging propagation to custom extractors you should use self.log instead of creating a logger yourself. OPENLINEAGE_EXTRACTORS=.ExtractorClass .AnotherExtractorClass One way is to add them to the OPENLINEAGE_EXTRACTORS environment variable, separated by a semi-colon ( ). There are two ways to register them for use in openlineage-airflow.
If your DAGs contain additional operators from which you want to extract lineage data, fear not - you can always
Airflow docker requirements.txt install#
There is an experimental SQL parser activated if you install openlineage-sql on your Airflow worker.
S3CopyObjectExtractor, S3FileTransformExtractor. SageMakerTransformOperator, SageMakerTransformOperatorAsync. SageMakerTrainingOperator, SageMakerTrainingOperatorAsync. SageMakerProcessingOperator, SageMakerProcessingOperatorAsync. RedshiftDataOperator, RedshiftSQLOperator. Openlineage-airflow provides extractors for: Context : The Airflow context for the task. Suited to extract metadata from a particular operator (or operators). Openlineage-airflow allows you to do more than that by building "Extractors." An extractor is an object Unless you use one of the few operators for which this integration provides an extractor, input and output metadata will not be sent. If you do nothing, the OpenLineage backend will receive the Job and the Run from your DAGs, but, MARQUEZ_URL=Įxtractors : Sending the correct data from your DAGs MARQUEZ_URL, MARQUEZ_NAMESPACE and MARQUEZ_API_KEY variables. Airflow docker requirements.txt code#
OPENLINEAGE_AIRFLOW_DISABLE_SOURCE_CODE - set to False if you want the source code of callables provided in the PythonOperator to be sent in OpenLineage events.įor backwards compatibility, openlineage-airflow also supports configuration via. OPENLINEAGE_NAMESPACE - set if you are using something other than the default namespace for the job namespace. OPENLINEAGE_API_KEY - set if the consumer of OpenLineage events requires a Bearer authentication key. OPENLINEAGE_URL - point to the service that will consume OpenLineage events. The OpenLineage client depends on environment variables: Openlineage-airflow uses the OpenLineage client to push data to OpenLineage backend. On DAG complete, also mark the task as complete in OpenLineageĬonfiguration HTTP Backend Environment Variables. Collect task run-level metadata (execution time, state, parameters, etc.). Collect task input / output metadata ( source, schema, etc.). On DAG start, collect metadata for each task using an Extractor if it exists for a given operator. The OpenLineageBackend does not take into account manually configured inlets and outlets. In contrast to integration via subclassing a DAG, a LineageBackend-based approach collects all metadata Set your LineageBackend in your airflow.cfg or via environmental variable AIRFLOW_LINEAGE_BACKEND to openlineage.lineage_backend.OpenLineageBackend This method has limited support: it does not support tracking failed jobs, and job starts are registered only when a job ends. This means you don't have to do anything besides configuring it, which is described in the Configuration section. The integration automatically registers itself for Airflow 2.3 if it's installed on the Airflow worker's Python. To install from source, run: $ python3 setup.py install
Note: You can also add openlineage-airflow to your requirements.txt for Airflow. Installation $ pip3 install openlineage-airflow A library that integrates Airflow DAGs with OpenLineage for automatic metadata collection.