Why This Matters
Combinatorial optimization sits at the heart of computer science and discrete mathematics. It's where theoretical elegance meets real-world problem solving. These problems appear everywhere: routing delivery trucks, scheduling exams, designing computer chips. The core challenge is that many of these problems are NP-hard, meaning no known efficient exact solution exists for large inputs. But understanding their structure helps you choose the right approximation algorithms, heuristics, or special-case solutions.
Don't just memorize problem names and definitions. Know why certain problems are computationally hard, how different problems relate to each other, and which algorithmic techniques apply to each category. Exam questions often ask you to identify which optimization framework fits a given scenario, compare the complexity of related problems, or explain why a greedy approach works for one problem but fails for another.
Path and Flow Optimization
These problems focus on finding optimal routes or maximizing throughput through network structures. The underlying principle: graphs model connectivity, and edge weights capture costs or capacities.
Traveling Salesman Problem
- NP-hard complexity: no known polynomial-time algorithm exists, making it a benchmark for computational hardness
- The goal is to find a Hamiltonian cycle (a cycle visiting every vertex exactly once and returning to the start) with minimum total edge weight
- Heuristics dominate in practice. Nearest neighbor greedily visits the closest unvisited city, 2-opt iteratively improves a tour by swapping edge pairs, and genetic algorithms evolve a population of candidate tours. These provide near-optimal solutions for logistics and circuit design, though none guarantee optimality.
Shortest Path Problem
- Dijkstra's algorithm runs in O((V+E)logV) using a priority queue, but only works for graphs with non-negative edge weights
- Bellman-Ford handles negative weights in O(VE) time and can detect negative-weight cycles (where no true shortest path exists because you could loop forever to decrease cost)
- GPS systems, network routing protocols, and urban planning all depend on efficient shortest path computation
Maximum Flow Problem
The goal is to push as much "flow" as possible from a source vertex to a sink vertex through a network with limited-capacity edges.
- Ford-Fulkerson method iteratively finds augmenting paths (paths with remaining capacity from source to sink) until none exist. The Edmonds-Karp variant uses BFS to select augmenting paths, guaranteeing O(VE2) time.
- Two constraints govern every flow network: capacity constraints limit flow on each edge to its capacity, and flow conservation requires that inflow equals outflow at every vertex except the source and sink.
- Max-flow min-cut theorem: the maximum flow value equals the capacity of the minimum cut. This connects flow problems to graph partitioning and is one of the most important duality results in combinatorics.
Compare: Shortest Path vs. Maximum Flow: both operate on weighted graphs, but shortest path minimizes cumulative edge weights along one path while maximum flow maximizes throughput across multiple simultaneous paths. If asked about network bandwidth, think flow; if asked about travel time, think shortest path.
Packing and Covering Problems
These problems involve selecting subsets to either fill a container optimally (packing) or ensure complete coverage (covering). The tension between these dual perspectives is a recurring theme.
Knapsack Problem
- The 0/1 version (each item is either taken or not) uses dynamic programming in O(nW) time, where n is the number of items and W is the capacity. This is called pseudo-polynomial because W is a numeric value, not the input size. The fractional version (you can take partial items) yields to a simple greedy strategy: sort by value-to-weight ratio and take items in that order.
- Formally: maximize โviโxiโ subject to โwiโxiโโคW, where viโ and wiโ are the value and weight of item i.
- This is the archetype for resource allocation: budgeting, investment portfolios, and cargo loading all follow this structure.
Bin Packing Problem
- NP-hard with no polynomial-time exact solution. The First Fit Decreasing heuristic (sort items largest-first, place each in the first bin where it fits) guarantees at most 911โโ
OPT+96โ bins.
- The fixed bin capacity distinguishes this from knapsack. Here, every item must be packed (you can't leave items behind), and the goal is minimizing the number of bins rather than maximizing value.
- Practical applications include server load balancing, cutting stock problems, and container shipping logistics.
Set Cover Problem
- Input structure: a universe U of elements and a collection of subsets S1โ,S2โ,โฆ,Smโ whose union is U. The goal is to select the fewest subsets that together cover all of U.
- The greedy algorithm (repeatedly pick the subset covering the most uncovered elements) achieves an O(lnn) approximation ratio, where n=โฃUโฃ. This is provably the best possible polynomial-time approximation unless P=NP.
- Set cover generalizes vertex cover: in vertex cover, each "subset" corresponds to the two endpoints of an edge. Recognizing this relationship helps you identify equivalent problem formulations.
Compare: Knapsack vs. Bin Packing: knapsack maximizes value in one fixed-capacity container, while bin packing minimizes containers needed for all items. Knapsack asks "what fits best?" while bin packing asks "how few bins suffice?"
Graph Structure Problems
These problems extract or exploit structural properties of graphs: connectivity, independence, and partitioning. Understanding graph structure often reveals efficient algorithms or proves hardness.
Minimum Spanning Tree
- Prim's algorithm grows a single tree from a starting vertex, always adding the cheapest edge connecting the tree to a non-tree vertex. Runs in O(ElogV). Kruskal's algorithm sorts all edges by weight and adds them one by one, skipping any edge that would create a cycle.
- Greedy works perfectly here. Both algorithms produce the globally optimal solution. This is possible because MST has the matroid structure (a property that guarantees greedy optimality), unlike most NP-hard problems.
- Telecommunications, power grids, and transportation networks all minimize infrastructure cost using MST.
Vertex Cover
- A vertex cover is a set of vertices such that every edge in the graph has at least one endpoint in the set. The optimization problem asks for the minimum such set.
- NP-hard, but admits a simple 2-approximation: repeatedly pick any uncovered edge and add both its endpoints to the cover. This guarantees a solution at most twice the optimal size.
- Dual to independent set: if S is a vertex cover of graph G=(V,E), then VโS is an independent set (no two vertices in it share an edge), and vice versa. So a minimum vertex cover immediately gives you a maximum independent set.
Graph Coloring
- The chromatic number ฯ(G) is the minimum number of colors needed to color the vertices of G so that no two adjacent vertices share a color. Computing ฯ(G) is NP-hard for general graphs.
- Planar graphs always satisfy ฯ(G)โค4 by the Four Color Theorem. Bipartite graphs are exactly the 2-colorable graphs.
- Scheduling applications are the most common real-world connection: exam scheduling (exams sharing students can't overlap), register allocation in compilers, and frequency assignment in wireless networks all reduce to graph coloring.
Compare: Vertex Cover vs. Set Cover: vertex cover is a special case of set cover where each "set" corresponds to an edge (containing its two endpoints). The 2-approximation for vertex cover beats the lnn bound for general set cover precisely because of this restricted structure.
Assignment and Matching Problems
These problems pair elements from two sets optimally. The key structure is the bipartite graph, where edges connect elements across two disjoint sets but never within the same set.
Assignment Problem
- The Hungarian algorithm solves the assignment problem in O(n3) time. This polynomial complexity makes it tractable, unlike most combinatorial optimization problems.
- The setup: you have n agents and n tasks, with a cost for each agent-task pair. Each agent is assigned exactly one task and each task to exactly one agent, minimizing total cost (or maximizing total profit).
- The linear programming relaxation of this problem naturally produces integer solutions. This happens because the constraint matrix is totally unimodular, meaning every square submatrix has determinant 0, 1, or -1. You don't need to explicitly enforce integer constraints.
Compare: Assignment Problem vs. General Matching: assignment restricts to bipartite graphs with perfect matchings, enabling polynomial-time solutions. General weighted matching on non-bipartite graphs requires more sophisticated algorithms like Edmonds' blossom algorithm.
Quick Reference Table
|
| NP-hard problems | Traveling Salesman, Graph Coloring, Vertex Cover, Set Cover, Bin Packing |
| Polynomial-time solvable | Minimum Spanning Tree, Shortest Path, Maximum Flow, Assignment Problem |
| Dynamic programming applies | Knapsack (0/1), Shortest Path (Bellman-Ford) |
| Greedy works optimally | Minimum Spanning Tree, Fractional Knapsack |
| Greedy gives approximation | Set Cover (lnn), Vertex Cover (2-approx), Bin Packing |
| Graph connectivity focus | MST, Maximum Flow, Shortest Path |
| Resource allocation models | Knapsack, Bin Packing, Assignment Problem |
| Scheduling applications | Graph Coloring, Set Cover, Assignment Problem |
Self-Check Questions
-
Which two problems are both NP-hard but have fundamentally different approximation guarantees: one admitting a constant-factor approximation and one only a logarithmic approximation? Explain why the structural difference matters.
-
Compare the Knapsack Problem and Bin Packing Problem: what structural difference explains why dynamic programming solves one efficiently but not the other?
-
Both Minimum Spanning Tree and Shortest Path operate on weighted graphs. Why does the greedy approach guarantee optimality for MST but require more careful algorithm design (like Dijkstra's) for shortest paths?
-
If a problem describes a scenario where you must assign n workers to n jobs with varying efficiency ratings, which problem framework applies, and what algorithm would you recommend?
-
Vertex Cover and Independent Set are called "dual" problems. If a graph has 10 vertices and a minimum vertex cover of size 4, what is the size of its maximum independent set, and why?