Table of contents
Introduction
Database design principles ensure that databases are structured to be efficient, scalable, and easy to maintain.
Key principles include the use of primary keys and foreign keys, defining relationships between tables, and sometimes denormalizing data to optimize performance.
Let's dive into it.
Primary Keys and Foreign Keys
Primary Keys
A primary key is a column (or a set of columns) in a table that uniquely identifies each row in that table. The primary key ensures that each record in the table is unique and not null.
Unique: No two rows can have the same primary key value.
Not Null: Primary keys cannot contain NULL values.
Immutable: Primary key values should not change over time.
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(100)
);
For instance, in a customer database, the customer ID would be a primary key, ensuring each customer is uniquely identifiable.
Foreign Keys
A foreign key is a column (or a set of columns) that creates a link between two tables. It acts as a cross-reference between tables because it references the primary key of another table, establishing a relationship between the two tables.
Establishes a Link: Links the data in two tables.
Ensures Referential Integrity: Ensures that the value in the foreign key column matches a value in the referenced primary key column.
Allows NULL Values: Can contain NULL values (unless specified otherwise).
CREATE TABLE departments (
department_id INT PRIMARY KEY,
department_name VARCHAR(50)
);
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(100),
department_id INT,
FOREIGN KEY (department_id) REFERENCES departments(department_id)
);
For example, in an employee database, an employee
might belong to a department
, and the department_id
in the employees
table would be a foreign key referencing the department_id
in the departments
table.
Relationships
One-to-One Relationships
In a one-to-one relationship, each row in one table is linked to one and only one row in another table. This type of relationship is less common and is often used to split a table for security, performance, or organizational reasons.
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50)
);
CREATE TABLE employee_details (
employee_id INT PRIMARY KEY,
address VARCHAR(100),
phone_number VARCHAR(15),
FOREIGN KEY (employee_id) REFERENCES employees(employee_id)
);
One-to-one relationships can be used to separate confidential information, such as splitting an employee's general information from their sensitive details like salary or health records. This ensures that access to sensitive information can be restricted.
One-to-Many Relationships
In a one-to-many relationship, each row in the parent table can be related to multiple rows in the child table. This is the most common type of relationship.
CREATE TABLE departments (
department_id INT PRIMARY KEY,
department_name VARCHAR(50)
);
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50),
department_id INT,
FOREIGN KEY (department_id) REFERENCES departments(department_id)
);
For instance, a department can have multiple employees, but each employee belongs to only one department.
One-to-many relationships are very used in databases to link tables, such as linking customers to their orders, employees to their departments, or products to their categories.
Many-to-Many Relationships
In a many-to-many relationship, each row in one table can be related to many rows in another table and vice versa. This type of relationship is implemented using a junction table that contains foreign keys from both related tables.
CREATE TABLE projects (
project_id INT PRIMARY KEY,
project_name VARCHAR(100)
);
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50)
);
CREATE TABLE employee_projects (
employee_id INT,
project_id INT,
PRIMARY KEY (employee_id, project_id),
FOREIGN KEY (employee_id) REFERENCES employees(employee_id),
FOREIGN KEY (project_id) REFERENCES projects(project_id)
);
For example, an employee can work on multiple projects, and each project can have multiple employees.
Many-to-many relationships are used to represent complex associations, such as students enrolled in multiple courses, employees working on multiple projects, or customers purchasing multiple products.
Denormalization
Denormalization is the process of combining tables to reduce the complexity of queries and improve read performance at the expense of write performance and data redundancy. It is often used in data warehousing and OLAP systems where read performance is critical.
When to Use Denormalization
When Query Performance is Critical: When you need to optimize read performance and can afford some redundancy.
Read-Heavy Systems: When the system is read-heavy with fewer updates.
Complex Joins: When complex joins are causing performance bottlenecks.
Denormalization Techniques
Let's check some usual denormalization techniques:
Adding Redundant Data
Including frequently accessed but non-key data in a table can reduce the need for joins, speeding up queries.
Let's say you frequently need to access customer names along with order details. Instead of joining the orders
table with the customers
table each time, you can add the customer_name
directly to the orders
table.
ALTER TABLE orders
ADD customer_name VARCHAR(100);
Before Denormalization:
SELECT o.order_id, o.order_date, o.amount, c.customer_name
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id;
After Denormalization:
SELECT order_id, order_date, amount, customer_name
FROM orders;
Real-World: Adding redundant data is commonly used in e-commerce platforms where the order history page needs to display the customer's name along with order details. By including the customer_name
directly in the orders
table, the system can quickly retrieve and display order information without the overhead of joining multiple tables.
Precomputing Aggregations
Storing precomputed summary data can speed up read operations, especially for complex calculations.
If you frequently need to get monthly sales summaries, precomputing and storing these values can save processing time.
-- Example of precomputing monthly sales
INSERT INTO monthly_sales_summary (month, total_sales)
SELECT MONTH(order_date), SUM(amount)
FROM orders
GROUP BY MONTH(order_date);
Before Denormalization:
SELECT MONTH(order_date) AS month, SUM(amount) AS total_sales
FROM orders
GROUP BY MONTH(order_date);
After Denormalization:
SELECT month, total_sales
FROM monthly_sales_summary;
Real-World: Precomputing aggregations is used in data warehousing and business intelligence systems. For example, a retail company might precompute daily, weekly, or monthly sales totals to generate reports quickly without recalculating these metrics each time a report is requested. This technique is essential for dashboards that display real-time analytics.
Using Arrays
Storing multiple values in a single column can reduce the number of rows and simplify data retrieval.
If you have a products table and each product can have multiple tags, you can store these tags as a comma-separated list in a single column.
-- Example of inserting data
INSERT INTO products (product_id, product_name, tags)
VALUES (1, 'Laptop', 'electronics,computers,portable');
-- Query to get product tags (additional parsing may be required on the application side)
SELECT product_id, product_name, tags
FROM products;
Before Denormalization:
SELECT p.product_id, p.product_name, t.tag
FROM products p
JOIN product_tags t ON p.product_id = t.product_id;
After Denormalization:
SELECT product_id, product_name, tags
FROM products;
Real-World: Using arrays or comma-separated values is common in content management systems (CMS) where each article or product may have multiple tags or categories. This approach simplifies the schema and can improve performance for tag-based searches. However, it requires careful handling in the application layer to parse and manage the array data.
Conclusion
Understanding and applying database design principles, including the use of primary and foreign keys, defining relationships, and strategic denormalization, are crucial for creating efficient, scalable, and maintainable databases. These principles ensure data integrity and optimize performance for both read and write operations.