# The Closest Pair Problem

We are given a set 𝑃 of n points in the plane *ℝ²*, and the problem is to find out the closest pair of points in the array. This problem arises in a number of applications. For example, in air-traffic control, you may want to monitor planes that come too close together, since this may indicate a possible collision.

Recall that *d(p, q)* is the notation for the Euclidean distance between *p* and *q*.

In other words, we have to find out a pair *p, q *∈ *P* of distinct points that minimize *d(p, q).*

The Brute force solution is O(n²), compute the distance between each pair and return the smallest. There is definitely a better solution. We can introduce it by some initial observations. Consider the one-dimensional version of closest pair. We can solve it by first sorting the points, which takes O(n log(n)) time, and return the closest pair of adjacent points with O(n) time. This result in a total of O(n log(n)) time complexity. Therefore, our goal here for the two-dimensional case is a O(n log(n)) algorithm.

# High-Level Approach

## Pre-processing: Sort [O(n log(n))]

Similar to the one-dimensional case, we first make two copies of points, Px and Py, sorted by x-coordinate and by y-coordinate. The it can be sorted by merge sort with O(n log(n)) time.

## Divide and Conquer [O(n log(n))]

However, since it is not that simple for the two-dimensional case to find the closest pair of points, divide and conquer is being used. Here in the closest pair problem, we’re going to proceed exactly as we did in the merge sort and counting inversions problems, where we took the array and broke it into its left and right half. So here, we’re going to take the input point set, and again, just recurse on the left half of the points, and recurse on the right half of the points with respect to the points’ x coordinates. We can then conquer the problem by following steps:

**Divide the given array in two halves**

We take P[n/2]*of the sorted x-array*as middle point. Let L = The first subarray contains points from P[0] to P[n/2](left half of P), R = The second subarray contains points from P[n/2+1] to P[n-1](right half of P). Form Lx, Ly, Rx, Ry within O(n) time.**Recursively find the smallest distances in both subarrays**

(p₁, q₁) = ClosestPair(Lx, Ly)

(p₂, q₂) = ClosestPair(Rx, Ry)

Let*d*=*min{d(p₁, q₁), d(p₂, q₂)}***Find closest split pair: ClosestSplitPair(Px, Py, d)**

Consider the vertical line passing through P[n/2] and find all points whose x coordinate is closer than*d*

However, the finding part is kind of tricky and the correctness will be proven later. What we will do here is iterate through all the points in Sy with its 1st to 7th adjacent points and update the best closest pair. Each point compares to a constant number, 7 in this case, of points in each iteration, therefore, the time complexity is O(n).

So in pseudo code, the steps are as follow:

ClosestSplitPair(Px, Py, d)

Sy = []

For p in Py:

if Px[n/2][0]-d <= p[0] <= Px[n/2][0]+d

Sy.append(p)

For i=1 to |Sy|-7

For j=1 to 7

Let p, q = Sy[i], Sy[j]

If d(p,q) < best

best pair = (p,q),

best = d(p,q)ClosestPair(Px, Py)

L = Px[:n/2]

R = Px[n/2:]

(p₁, q₁) = ClosestPair(Lx, Ly)

(p₂, q₂) = ClosestPair(Rx, Ry)

d=min{d(p₁, q₁), d(p₂, q₂)}(p₃, q₃) = ClosestSplitPair(Px, Py, d)

return best of (p₁, q₁), (p₂, q₂), (p₃, q₃)

T(n) = 2T(n/2) + O(n) + O(n) + O(n)

T(n) = 2T(n/2) + O(n)

T(n) = O(n log(n))

The overall time complexity is O(n log(n)).

# Correctness Claim

Here, we are going to prove the approach we used in finding closest split pair is correct.

Claim:Let p ∈ Q, q ∈ R be a split pair with d(p, q)< 𝛿.Then

(A) p and q are members of Sy

(B) p and q are at most 7 positions apart in Sy

## Proof of Claim (A)

It is really obvious that this claim is certainly correct, since *d(p, q) < 𝛿 *indicates that *|x₁−x₂| ≤ 𝛿 *and* |y₁ −y₂| ≤ 𝛿*, which in this case, makes p, q members of Sy.

## Proof of Claim (B)

**Lemma 1 :** all points of Sy with y-coordinate between those of p and q, inclusive, lie in one of these 8 boxes.**Proof :**

First, recall y-coordinates of p, q differ by < 𝛿.

Second, by definition of Sy, all have x-coordinates between 𝚇−𝛿 and 𝚇+𝛿

**Lemma 2 :** At most one point of P in each box.**Proof : **(by contradiction)

Suppose a, b lie in the same box. Then :

I. a, b are either both in Q or both in R

II. *d(a, b) ≤ 𝛿/2 ﹒√2 ≤ 𝛿*

But (i) and (ii) contradict the definition of 𝛿 (as smallest distance between pairs of points in Q or in R)

## Final Wrap-Up

Lemmas 1 and 2

⟹ at most 8 points in this picture (including p and q)

⟹ Positions of p, q in Sy differ by at most 7