Normalization is the process of organising columns and tables to minimise redundancy and avoid anomalies — the insertion, update and deletion problems that arise when one table tries to store too much. You apply a series of normal forms, each removing a specific kind of bad dependency.
Why normalize
Storing everything in one wide table causes anomalies: the same fact is repeated (update anomaly), you can't add data without unrelated data (insertion anomaly), or deleting a row loses unrelated facts (deletion anomaly). Normalization splits tables so each fact lives in exactly one place.
The normal forms
| Form | Rule (informally) |
|---|---|
| 1NF | atomic values; no repeating groups or multi-valued cells |
| 2NF | 1NF + no partial dependency on part of a composite key |
| 3NF | 2NF + no transitive dependency (non-key depends on non-key) |
| BCNF | stricter 3NF: every determinant is a candidate key |
⚡ The edge
- A memorable summary: 1NF kills repeating groups; 2NF removes partial dependencies; 3NF removes transitive dependencies; BCNF demands every determinant be a candidate key.
- Normalize to reduce redundancy, but be ready to discuss denormalization — deliberately adding redundancy back for read performance, a real trade-off in analytics systems.
Worked example
'What is the difference between 2NF and 3NF?'
- Both assume the table is already in the previous form (1NF, then 2NF).
- 2NF removes partial dependencies — a non-key attribute depending on only part of a composite key.
- 3NF removes transitive dependencies — a non-key attribute depending on another non-key attribute.
Answer: 2NF removes partial dependencies on a composite key; 3NF removes transitive (non-key -> non-key) dependencies.
Worked example
'Why normalize a database?'
- State the goal: reduce data redundancy and the anomalies it causes.
- Name the three anomalies: insertion, update and deletion.
- Conclude: each fact stored once means updates touch one place and the data stays consistent.
Answer: To eliminate redundancy and the insertion/update/deletion anomalies, keeping each fact in one place.
⚠ Watch out
- Each normal form assumes the previous one — you can't be in 3NF without being in 2NF.
- 2NF is about partial dependency; 3NF is about transitive dependency — don't swap them.
- More normalization isn't always better: heavy normalization can hurt read performance, hence denormalization.