Hamming Distance

The Hamming distance between two strings of the same length is the count of positions where the characters differ between them.

In other words, the Hamming distance measures the minimum number of substitutions needed to change one string into the other.

Given two strings \( s \) and \( t \) of length \( n \), the Hamming distance \( D_H(s, t) \) is defined as:

$$ D_H(s, t) = \sum_{i=1}^{n} \delta(s_i, t_i) $$

Where \( s_i \) and \( t_i \) represent the characters at position \( i \) in strings \( s \) and \( t \), and \( \delta(s_i, t_i) \) is a function that returns 1 if \( s_i \neq t_i \) and 0 otherwise.

The Hamming distance is only defined for strings of equal length \( n \).

A Practical Example

For example, if we consider the strings:

$$ s = \text{"karbon"} $$

$$ t = \text{"carbon"} $$

The Hamming distance between \( s \) and \( t \) is 1, since they differ only in the first position (k vs. c).

Example 2

Here’s another example of two English words with a Hamming distance of 2:

$$ s = \text{"thing"} $$

$$ t = \text{"thank"} $$

These words differ in two positions:

The third letter, “i” in “thing” versus “a” in “thank,” and the fifth letter, “g” in “thing” versus “k” in “thank.”

Metric Topology Based on Hamming Distance

The Hamming distance defines a metric space because it satisfies the following properties for all strings \( x \), \( y \), and \( z \) of equal length: 

  1. Non-negativity

    $$ D_H(x, y) \geq 0 \quad \text{and} \quad D_H(x, y) = 0 \ \text{if and only if} \ x = y $$

    The Hamming distance counts the differences between two strings, so it’s always non-negative. Also, if \( x \) and \( y \) are identical, then \( D_H(x, y) = 0 \).
  2. Symmetry

    $$ D_H(x, y) = D_H(y, x) $$

    The Hamming distance between \( x \) and \( y \) is equal to the distance between \( y \) and \( x \) because it simply counts positional differences, regardless of order.
  3. Triangle Inequality

    $$ D_H(x, z) \leq D_H(x, y) + D_H(y, z) $$

    For any three strings \( x \), \( y \), and \( z \), the direct distance \( D_H(x, z) \) cannot exceed the sum of distances \( D_H(x, y) + D_H(y, z) \). This is because any difference between \( x \) and \( z \) must show up in at least one of the partial distances between \( x \) and \( y \) or between \( y \) and \( z \).

Since the Hamming distance meets these conditions, it defines a metric space on the set of all strings of a given length.

In a metric space, we can establish a topology based on proximity as defined by the distance.

Thus, the Hamming distance can also define a metric topology.

For any string \( x \) of length \( n \) and radius \( r \), we can define an “open ball” (neighborhood) $ B $ centered at \( x \) as the set of all strings within a Hamming distance of less than \( r \) from \( x \).

$$ B(x, r) = \{y \mid D_H(x, y) < r\}. $$

These open balls, as defined by the Hamming distance, form a basis for the metric topology, as open sets in a metric topology are created by taking arbitrary unions of these “open balls.”

Note. Since the set of all binary strings (or symbol strings) of a given length \( n \) is finite, the topology induced by the Hamming distance on this set is the discrete topology. This means each string represents an open set (and a closed set) in this topology. This occurs because we can choose a very small radius (for example, \( r = 1 \)) so that each string is contained in an open ball that includes no other strings.

Thus, the Hamming distance generates a metric topology that, for finite sets of strings of the same length, is a discrete topology.

Example

Consider the set of some binary strings of length three:

$$ X = \{000, 001, 011, 110, 111 \} $$

The topology induced by the Hamming distance with radius \( r = 2 \) is built by examining the open balls of radius 2 around each string.

$$ B(x, r) = \{y \mid D_H(x, y) < 2 \} $$

In other words, each open set consists of strings that differ from the central string by only one position.

  • Open ball centered at \( 000 \): $$ B(000, 2) = \{000, 001\} $$ Here, \( 000 \) is the central string, and only \( 001 \) differs from \( 000 \) by one position.
  • Open ball centered at \( 001 \): $$ B(001, 2) = \{001, 000, 011 \} $$ Here, \( 001 \) is the central string, with \( 000 \) and \( 011 \) differing from \( 001 \) by one position.
  • Open ball centered at \( 011 \): $$ B(011, 2) = \{011, 001, 111\} $$ Here, \( 011 \) is the central string, with \( 111 \) and \( 011 \) diffrering from \( 011 \) by one position.
  • Open ball centered at \( 110 \): $$ B(110, 2) = \{110, 111\} $$ Here, \( 110 \) is the central string, and only \( 111 \) differs from \( 110 \) by one position.
  • Open ball centered at \( 111 \): $$ B(111, 2) = \{111, 011, 110\} $$ Here, \( 111 \) is the central string, with \( 011 \) and \( 110 \) differing from \( 111 \) by one position.

These open balls define the basis of the induced topology for the distance with radius \( r = 2 \), as they allow us to create other open sets in the topology by taking their unions.

Note. A radius of \( r = 2 \) means that strings differ in only one position, since an "open set" excludes boundary elements. Therefore, the inequality used here is "less than 2": $$ B(x, r) = \{ y \mid D_H(x, y) < 2 \} $$ To create "closed sets" instead, the inequality should be "less than or equal to 2": $$ C(x, r) = \{ y \mid D_H(x, y) \le 2 \} $$

For example, the union of the sets \( B(000, 2) \) and \( B(110, 2) \) is:

$$ B(000, 2) \cup B(110, 2) $$

Knowing that $ B(000, 2) = \{000, 001\} $ and $ B(110, 2) = \{110, 111\} $, their union is:

$$ B(000, 2) \cup B(110, 2) = \{000,001\} \cup \{110, 111 \} $$

$$ B(000, 2) \cup B(110, 2) = \{000, 001, 110, 111\} $$

Thus, in this topology, the set $ \{000, 001, 110, 111\} $ is also an open set.

In conclusion, this setup defines a metric topology based on the Hamming distance with radius \( r=2 \), where each open set is formed as a union of open balls of radius 2.

And so on. 

 
 

Please feel free to point out any errors or typos, or share suggestions to improve these notes. English isn't my first language, so if you notice any mistakes, let me know, and I'll be sure to fix them.

FacebookTwitterLinkedinLinkedin
knowledge base

Metric Topology