Using Configuration Files for Dynamic System Behavior
A practical example from my MarketPipe project
Introduction
In software development, configuration files play a very important role in creating flexible and maintainable systems.
They allow you to manage system behavior without changing the codebase, making it easier to adapt to changing requirements.
Let's discuss the importance of configuration files, explore how the MarketPipe project uses them, and provide best practices for structuring and reading configurations.
The Role of Configuration Files
Configuration files store settings and parameters that dictate how an application behaves. They enable:
Separation of Concerns: Configuration files separate configuration from code, allowing for easier maintenance and updates.
Flexibility: Changes can be made without modifying the codebase, promoting quicker adaptations.
Scalability: As applications grow, configuration files help manage complexity by organizing settings in a structured manner.
Configuration Management in MarketPipe
In the MarketPipe project, configuration files are used to manage API client settings, define DAGs (Directed Acyclic Graphs) for data processing, and configure several system behaviors. Let's dive into some specific examples from the project.
Example Configuration File
Here's a quick view on mdp_config.json
, the configuration file used:
{
"owner": "data_engineer",
"email_on_failure": true,
"email_on_retry": true,
"retries": 3,
"clients": {
"stocks": {
"module": "core.api_clients.stock_api_client",
"class": "StockApiClient",
"schedule_interval": "0 12 * * *"
},
"cryptos": {
"module": "core.api_clients.crypto_api_client",
"class": "CryptoApiClient",
"symbols": ["BTC", "ETH"],
"schedule_interval": "0 1 * * *"
}
},
"assets": {
"stocks": {
"symbols": ["AAPL", "GOOGL", "MSFT"],
"schedule_interval": "0 12 * * *"
},
"cryptos": {
"symbols": ["BTC", "ETH"],
"schedule_interval": "0 1 * * *"
}
}
}
Reading Configuration Files
MarketPipe uses a utility function read_json
to read and parse the configuration file. This function simplifies loading configuration settings into the application.
from utils import read_json
CONFIG = read_json("mdp_config.json")
Using Configurations to Load API Clients
MarketPipe dynamically loads API clients based on the configuration file. The ApiClientFactory
class reads the configuration to determine which API client to instantiate.
from importlib import import_module
from typing import Dict, Type
from utils import read_json
# Read configuration file
CONFIG = read_json("mdp_config.json")
class ApiClientFactory:
def __init__(self, logger: logging.Logger):
self.logger = logger
self.clients = self.load_clients()
def load_clients(self) -> Dict[str, Type[BaseApiClient]]:
"""
Dynamically load API clients from the configuration file.
"""
clients = {}
for client_name, settings in CONFIG["clients"].items(): # use CONFIG to get the clients configuration
module_name, class_name = settings["module"], settings["class"]
# Import the module dynamically based on the configuration
module = import_module(module_name)
# Get the class from the module
clients[client_name] = getattr(module, class_name)
return clients
Defining DAGs Using Configuration Files
Airflow DAGs are defined using the configuration file to set parameters such as schedule intervals and email notifications. This makes it easy to update schedules and settings without changing the DAG code.
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime
import logging
# Read configuration file
CONFIG = read_json("mdp_config.json")
# Set default arguments for DAGs using configuration settings
default_args = {
"owner": CONFIG.get("owner", "airflow"), # Used
"depends_on_past": False,
"start_date": datetime.now(),
"email_on_failure": CONFIG.get("email_on_failure", False), # Used
"email_on_retry": CONFIG.get("email_on_retry", False), # Used
"retries": CONFIG.get("retries", 1), # Used
}
def create_market_data_dag(asset_type, dag_id, description):
dag = DAG(
dag_id,
default_args=default_args,
schedule_interval=CONFIG["assets"][asset_type]["schedule_interval"], # Used
description=description,
)
........
........
........
# Create DAGs for different asset types
create_market_data_dag("stocks", "process_stock_data", "Collect and store stock data")
create_market_data_dag("cryptos", "process_crypto_data", "Collect and store crypto data")
Using Configuration in API Client Implementation
The CryptoApiClient
class uses configuration settings to initialize API-specific parameters. This ensures that all necessary settings are managed externally.
import logging
from utils import read_json
from core.base_api import BaseApiClient
CONFIG = read_json("mdp_config.json")
class CryptoApiClient(BaseApiClient):
def __init__(self, logger: logging.Logger):
"""
Initialize the CryptoApiClient with symbols from the configuration.
"""
super().__init__(logger=logger)
self.symbols = CONFIG["assets"]["cryptos"]["symbols"] # Get Crypto Symbols from Config file
def get_data(self):
"""
Fetch cryptocurrency data for the configured symbols.
"""
self.logger.info(f"Fetching data for symbols: {self.symbols}")
# Implement data fetching logic here
return {"data": "crypto_data"}
Best Practices for Organizing Configuration Files
Structure Configuration Files Logically: Group related settings together. For example, keep API client configurations under a single section.
Use Descriptive Keys: Make keys self-explanatory to ensure the configuration file is easy to read and understand.
Document Configuration Settings: Provide comments or documentation for each configuration option to explain its purpose.
Validate Configuration Files: Implement validation logic to check for missing or incorrect settings, ensuring the application can handle configuration errors (Pydantic is actually nice for this, or Pysertive to enforce preconditions, postconditions, and invariants).
Conclusion
Configuration files are awesome for managing dynamic system behavior.
In MarketPipe, they provide a flexible way to control API clients, define DAGs, and manage application settings.
By following best practices for structuring and reading configurations, you can build maintainable and scalable systems.
By leveraging configuration files, you can adapt to new requirements with minimal code changes, making your application more robust and easier to manage.