JPEG is a widely used lossy compression standard method specifically designed for images. It was developed by the Joint Photographic Experts Group.
JPEG compression can be roughly divided into five steps: 1) Color space transformation, 2) Downsampling, 3) Discrete Cosine Transform (DCT), 4) Quantization, and 5) Entropy coding.
Here, I will mainly analyze the third step, “Discrete Cosine Transform,” and the fourth step, “Quantization,” and provide the corresponding Python code implementation.
Discrete Cosine Transform (DCT)
Discrete Cosine Transform (DCT) is commonly used in signal processing and image processing as a method to transform an image from the spatial domain to the frequency domain. The idea behind DCT is that any function can be represented by the summation of cosine functions. DCT is a reversible process and inherently lossless.
DCT can be accomplished using a DCT transformation matrix.
T = C F C T T=CFC^T T=CFCT
Among them,
F F F represents the original image,
T T T represents the image after DCT. Assuming the matrix size is
N × N N \times N N×N , the DCT matrix
C C C is as follows:
C ( u , v ) = { 1 N u = 0 , 0 ≤ v ≤ N − 1 2 N c o s ( 2 v + 1 ) u π 2 N 1 ≤ u ≤ N − 1 , 0 ≤ v ≤ N − 1 C(u,v)=\begin{cases} \sqrt{\frac{1}{N}} &u=0,0\le v \le N-1 \\ \sqrt{\frac{2}{N}}cos\frac{(2v+1)u\pi}{2N} & 1\le u \le N-1,0\le v \le N-1 \end{cases} C(u,v)=⎩ ⎨ ⎧N1 N2 cos2N(2v+1)uπu=0,0≤v≤N−11≤u≤N−1,0≤v≤N−1
If the horizontal axis is
u u u , the vertical axis is
v v v , the matrix representation of
C C C is:
C ( u , v ) = [ 1 N 1 N ⋯ 1 N 2 N c o s π 2 N 2 N c o s 3 π 2 N ⋯ 2 N c o s ( 2 N − 1 ) π 2 N ⋮ ⋮ ⋱ ⋮ 2 N c o s ( N − 1 ) π 2 N 2 N c o s 3 ( N − 1 ) π 2 N ⋯ 2 N c o s ( 2 N − 1 ) ( N − 1 ) π 2 N ] C(u,v)=\begin{bmatrix} \sqrt{\frac{1}{N}} & \sqrt{\frac{1}{N}} & \cdots & \sqrt{\frac{1}{N}} \\ \sqrt{\frac{2}{N}}cos\frac{\pi}{2N} & \sqrt{\frac{2}{N}}cos\frac{3\pi}{2N} & \cdots & \sqrt{\frac{2}{N}}cos\frac{(2N-1)\pi}{2N} \\ \vdots & \vdots & \ddots & \vdots \\ \sqrt{\frac{2}{N}}cos\frac{(N-1)\pi}{2N} & \sqrt{\frac{2}{N}}cos\frac{3(N-1)\pi}{2N} & \cdots & \sqrt{\frac{2}{N}}cos\frac{(2N-1)(N-1)\pi}{2N} \\ \end{bmatrix} C(u,v)=⎣ ⎡N1 N2 cos2Nπ⋮N2 cos2N(N−1)πN1 N2 cos2N3π⋮N2 cos2N3(N−1)π⋯⋯⋱⋯N1 N2 cos2N(2N−1)π⋮N2 cos2N(2N−1)(N−1)π⎦ ⎤
Transpose matrix of
C C C is:
C T ( u , v ) = [ 1 N 2 N c o s π 2 N ⋯ 2 N c o s ( N − 1 ) π 2 N 1 N 2 N c o s 3 π 2 N ⋯ 2 N c o s 3 ( N − 1 ) π 2 N ⋮ ⋮ ⋱ ⋮ 1 N 2 N c o s ( 2 N − 1 ) π 2 N ⋯ 2 N c o s ( 2 N − 1 ) ( N − 1 ) π 2 N ] C^T(u,v)=\begin{bmatrix} \sqrt{\frac{1}{N}} & \sqrt{\frac{2}{N}}cos\frac{\pi}{2N} & \cdots & \sqrt{\frac{2}{N}}cos\frac{(N-1)\pi}{2N} \\ \sqrt{\frac{1}{N}} & \sqrt{\frac{2}{N}}cos\frac{3\pi}{2N} & \cdots & \sqrt{\frac{2}{N}}cos\frac{3(N-1)\pi}{2N} \\ \vdots & \vdots & \ddots & \vdots \\ \sqrt{\frac{1}{N}} & \sqrt{\frac{2}{N}}cos\frac{(2N-1)\pi}{2N} & \cdots & \sqrt{\frac{2}{N}}cos\frac{(2N-1)(N-1)\pi}{2N}\\ \end{bmatrix} CT(u,v)=⎣ ⎡N1 N1 ⋮N1 N2 cos2NπN2 cos2N3π⋮N2 cos2N(2N−1)π⋯⋯⋱⋯N2 cos2N(N−1)πN2 cos2N3(N−1)π⋮N2 cos2N(2N−1)(N−1)π⎦ ⎤
Basis functions above can be illustrated as images, known as the basis image. The specific method is as follows: For an NxN DCT matrix, by multiplying the nth column vector of
C C C with the mth row vector of
C T C^T CT , a total of
N 2 N^2 N2 matrices can be obtained. These matrices are then concatenated according to their positions (m,n), resulting in a basis image of size
N 2 × N 2 N^2 \times N^2 N2×N2 . Any image can be composed using these cosine transform combinations. In fact, DCT is to calculate the contribution of each cosine waves and get what we call “coefficient”.
import numpy as npimport matplotlib.pyplot as pltimport mathimport cv2import skimage.io as io# DCT matrixN=8 # matrix shape: 8*8DCT=np.zeros((N,N))for m in range(N):for n in range(N):if m==0:DCT[m][n]=math.sqrt(1/N)else:DCT[m][n]=math.sqrt(2/N)*math.cos((2*n+1)*math.pi*m/(2*N))# DCT basis imagebasis=np.zeros((N*N,N*N))for m in range(N):for n in range(N):pos_m=m*Npos_n=n*NDCT_v=DCT[m,:].reshape(-1,1)DCT_T_h=DCT.T[:,n].reshape(-1,N)basis[pos_m:pos_m+N,pos_n:pos_n+N]=np.matmul(DCT_v,DCT_T_h)# Center valuesbasis+=np.absolute(np.amin(basis))scale=np.around(1/np.amax(basis),decimals=3)for m in range(basis.shape[0]):for n in range(basis.shape[1]):basis[m][n]=np.around(basis[m][n]*scale,decimals=3)# Show basis imageplt.figure(figsize=(4,4))plt.gray()plt.axis('off')plt.title('DCT Basis Image')plt.imshow(basis,vmin=0)import numpy as np import matplotlib.pyplot as plt import math import cv2 import skimage.io as io # DCT matrix N=8 # matrix shape: 8*8 DCT=np.zeros((N,N)) for m in range(N): for n in range(N): if m==0: DCT[m][n]=math.sqrt(1/N) else: DCT[m][n]=math.sqrt(2/N)*math.cos((2*n+1)*math.pi*m/(2*N)) # DCT basis image basis=np.zeros((N*N,N*N)) for m in range(N): for n in range(N): pos_m=m*N pos_n=n*N DCT_v=DCT[m,:].reshape(-1,1) DCT_T_h=DCT.T[:,n].reshape(-1,N) basis[pos_m:pos_m+N,pos_n:pos_n+N]=np.matmul(DCT_v,DCT_T_h) # Center values basis+=np.absolute(np.amin(basis)) scale=np.around(1/np.amax(basis),decimals=3) for m in range(basis.shape[0]): for n in range(basis.shape[1]): basis[m][n]=np.around(basis[m][n]*scale,decimals=3) # Show basis image plt.figure(figsize=(4,4)) plt.gray() plt.axis('off') plt.title('DCT Basis Image') plt.imshow(basis,vmin=0)import numpy as np import matplotlib.pyplot as plt import math import cv2 import skimage.io as io # DCT matrix N=8 # matrix shape: 8*8 DCT=np.zeros((N,N)) for m in range(N): for n in range(N): if m==0: DCT[m][n]=math.sqrt(1/N) else: DCT[m][n]=math.sqrt(2/N)*math.cos((2*n+1)*math.pi*m/(2*N)) # DCT basis image basis=np.zeros((N*N,N*N)) for m in range(N): for n in range(N): pos_m=m*N pos_n=n*N DCT_v=DCT[m,:].reshape(-1,1) DCT_T_h=DCT.T[:,n].reshape(-1,N) basis[pos_m:pos_m+N,pos_n:pos_n+N]=np.matmul(DCT_v,DCT_T_h) # Center values basis+=np.absolute(np.amin(basis)) scale=np.around(1/np.amax(basis),decimals=3) for m in range(basis.shape[0]): for n in range(basis.shape[1]): basis[m][n]=np.around(basis[m][n]*scale,decimals=3) # Show basis image plt.figure(figsize=(4,4)) plt.gray() plt.axis('off') plt.title('DCT Basis Image') plt.imshow(basis,vmin=0)
Enter fullscreen mode Exit fullscreen mode
This is the basis image of an 8×8 DCT matrix, with a size of 64×64. By observing the image, we can notice that the horizontal frequencies increase from left to right, while the vertical frequencies increase from top to bottom. The constant basis function in the top left corner is commonly referred to as the DC (Direct Current) basis function, and the corresponding DCT coefficient is known as the DC coefficient.
Any image can be composed by overlaying 64 cosine transformations from this reference image. Since we have set the size of the DCT matrix to be 8×8, we need to divide the image into several small 8×8 squares, referred to as blocks. The image is processed in units of blocks. Below, we will demonstrate 8 sets of DCT matrices along with their corresponding blocks.
# DCT matrixblocks=np.zeros((8*8,8))for i in range(8):blocks[i*8][i]=1# IDCT -> Original imagesblocks_idct=np.zeros((8*8,8))for i in range(8):block=blocks[i*8:i*8+8][:]data=cv2.idct(block)blocks_idct[i*8:i*8+8][:]=data# Show DCT matrixplt.figure(figsize=(16,3))for i in range(8):pos='18'+str(i+1)pos=int(pos)plt.subplot(pos)block=blocks[i*8:i*8+8][:]plt.gray()plt.axis('off')plt.imshow(block,vmin=0)# DCT matrix blocks=np.zeros((8*8,8)) for i in range(8): blocks[i*8][i]=1 # IDCT -> Original images blocks_idct=np.zeros((8*8,8)) for i in range(8): block=blocks[i*8:i*8+8][:] data=cv2.idct(block) blocks_idct[i*8:i*8+8][:]=data # Show DCT matrix plt.figure(figsize=(16,3)) for i in range(8): pos='18'+str(i+1) pos=int(pos) plt.subplot(pos) block=blocks[i*8:i*8+8][:] plt.gray() plt.axis('off') plt.imshow(block,vmin=0)# DCT matrix blocks=np.zeros((8*8,8)) for i in range(8): blocks[i*8][i]=1 # IDCT -> Original images blocks_idct=np.zeros((8*8,8)) for i in range(8): block=blocks[i*8:i*8+8][:] data=cv2.idct(block) blocks_idct[i*8:i*8+8][:]=data # Show DCT matrix plt.figure(figsize=(16,3)) for i in range(8): pos='18'+str(i+1) pos=int(pos) plt.subplot(pos) block=blocks[i*8:i*8+8][:] plt.gray() plt.axis('off') plt.imshow(block,vmin=0)
Enter fullscreen mode Exit fullscreen mode
# Show original imagesplt.figure(figsize=(16,3))for i in range(8):pos='18'+str(i+1)pos=int(pos)plt.subplot(pos)block_idct=blocks_idct[i*8:i*8+8][:]plt.gray()plt.xticks([])plt.yticks([])plt.imshow(block_idct,vmin=0)# Show original images plt.figure(figsize=(16,3)) for i in range(8): pos='18'+str(i+1) pos=int(pos) plt.subplot(pos) block_idct=blocks_idct[i*8:i*8+8][:] plt.gray() plt.xticks([]) plt.yticks([]) plt.imshow(block_idct,vmin=0)# Show original images plt.figure(figsize=(16,3)) for i in range(8): pos='18'+str(i+1) pos=int(pos) plt.subplot(pos) block_idct=blocks_idct[i*8:i*8+8][:] plt.gray() plt.xticks([]) plt.yticks([]) plt.imshow(block_idct,vmin=0)
Enter fullscreen mode Exit fullscreen mode
Let’s assume that the original images above are numbered from left to right as 1-8. We can see that Image 1 is exactly the same as the top-left small square in the DCT basis image, which means its DCT matrix only has weights in the top-left corner, and the weights are 0 everywhere else. Image 2 is identical to the small square in the first row and second column of the DCT basis image, so its DCT matrix shows weights only at (0,1), and the weights are 0 elsewhere. This pattern continues for the other images. This is the essence of the DCT.
The basic idea of JPEG compression
DCT itself is lossless, and its purpose is to transform the image from the spatial domain to the frequency domain. To achieve compression, we need to introduce a step between DCT and IDCT. The human eye is less sensitive to high-frequency information, so the compression method involves discarding high-frequency information from the image. In the DCT matrix, the information in the top-left corner is high-frequency, while the information in the bottom-right corner is low-frequency. Below, I will demonstrate the basic idea of JPEG compression by selectively retaining 1, 3, and 10 low-frequency components.
# Read imageimg=io.imread("img3.jpg",as_gray=True)# Obtaining a mask through zigzag scanningdef z_scan_mask(C,N):mask=np.zeros((N,N))start=0mask_m=startmask_n=startfor i in range(C):if i==0:mask[mask_m,mask_n]=1else:# If even, move upward to the rightif (mask_m+mask_n)%2==0:mask_m-=1mask_n+=1# If it exceeds the upper boundary, move downwardif mask_m<0:mask_m+=1# If it exceeds the right boundary, move leftif mask_n>=N:mask_n-=1# If odd, move downward to the leftelse:mask_m+=1mask_n-=1# If it exceeds the lower boundary, move upwardif mask_m>=N:mask_m-=1# If it exceeds the left boundary, move rightif mask_n<0:mask_n+=1mask[mask_m,mask_n]=1return mask# overlaying the mask, discarding the high-frequency componentsdef Compress(img,mask,N):img_dct=np.zeros((img.shape[0]//N*N,img.shape[1]//N*N))for m in range(0,img_dct.shape[0],N):for n in range(0,img_dct.shape[1],N):block=img[m:m+N,n:n+N]# DCTcoeff=cv2.dct(block)# IDCT, but only the parts of the image where the mask has a value of 1 are retainediblock=cv2.idct(coeff*mask)img_dct[m:m+N,n:n+N]=iblockreturn img_dct# Images keeping only 1, 3, and 10 low-frequency coefficientsplt.figure(figsize=(16,4))plt.gray()plt.subplot(141)plt.title('Original image')plt.imshow(img)plt.axis('off')plt.subplot(142)plt.title('Keep 1 coefficient')plt.imshow(Compress(img,z_scan_mask(1,8),8))plt.axis('off')plt.subplot(143)plt.title('Keep 3 coefficients')plt.imshow(Compress(img,z_scan_mask(3,8),8))plt.axis('off')plt.subplot(144)plt.title('Keep 10 coefficients')plt.imshow(Compress(img,z_scan_mask(10,8),8))plt.axis('off')plt.show()# Read image img=io.imread("img3.jpg",as_gray=True) # Obtaining a mask through zigzag scanning def z_scan_mask(C,N): mask=np.zeros((N,N)) start=0 mask_m=start mask_n=start for i in range(C): if i==0: mask[mask_m,mask_n]=1 else: # If even, move upward to the right if (mask_m+mask_n)%2==0: mask_m-=1 mask_n+=1 # If it exceeds the upper boundary, move downward if mask_m<0: mask_m+=1 # If it exceeds the right boundary, move left if mask_n>=N: mask_n-=1 # If odd, move downward to the left else: mask_m+=1 mask_n-=1 # If it exceeds the lower boundary, move upward if mask_m>=N: mask_m-=1 # If it exceeds the left boundary, move right if mask_n<0: mask_n+=1 mask[mask_m,mask_n]=1 return mask # overlaying the mask, discarding the high-frequency components def Compress(img,mask,N): img_dct=np.zeros((img.shape[0]//N*N,img.shape[1]//N*N)) for m in range(0,img_dct.shape[0],N): for n in range(0,img_dct.shape[1],N): block=img[m:m+N,n:n+N] # DCT coeff=cv2.dct(block) # IDCT, but only the parts of the image where the mask has a value of 1 are retained iblock=cv2.idct(coeff*mask) img_dct[m:m+N,n:n+N]=iblock return img_dct # Images keeping only 1, 3, and 10 low-frequency coefficients plt.figure(figsize=(16,4)) plt.gray() plt.subplot(141) plt.title('Original image') plt.imshow(img) plt.axis('off') plt.subplot(142) plt.title('Keep 1 coefficient') plt.imshow(Compress(img,z_scan_mask(1,8),8)) plt.axis('off') plt.subplot(143) plt.title('Keep 3 coefficients') plt.imshow(Compress(img,z_scan_mask(3,8),8)) plt.axis('off') plt.subplot(144) plt.title('Keep 10 coefficients') plt.imshow(Compress(img,z_scan_mask(10,8),8)) plt.axis('off') plt.show()# Read image img=io.imread("img3.jpg",as_gray=True) # Obtaining a mask through zigzag scanning def z_scan_mask(C,N): mask=np.zeros((N,N)) start=0 mask_m=start mask_n=start for i in range(C): if i==0: mask[mask_m,mask_n]=1 else: # If even, move upward to the right if (mask_m+mask_n)%2==0: mask_m-=1 mask_n+=1 # If it exceeds the upper boundary, move downward if mask_m<0: mask_m+=1 # If it exceeds the right boundary, move left if mask_n>=N: mask_n-=1 # If odd, move downward to the left else: mask_m+=1 mask_n-=1 # If it exceeds the lower boundary, move upward if mask_m>=N: mask_m-=1 # If it exceeds the left boundary, move right if mask_n<0: mask_n+=1 mask[mask_m,mask_n]=1 return mask # overlaying the mask, discarding the high-frequency components def Compress(img,mask,N): img_dct=np.zeros((img.shape[0]//N*N,img.shape[1]//N*N)) for m in range(0,img_dct.shape[0],N): for n in range(0,img_dct.shape[1],N): block=img[m:m+N,n:n+N] # DCT coeff=cv2.dct(block) # IDCT, but only the parts of the image where the mask has a value of 1 are retained iblock=cv2.idct(coeff*mask) img_dct[m:m+N,n:n+N]=iblock return img_dct # Images keeping only 1, 3, and 10 low-frequency coefficients plt.figure(figsize=(16,4)) plt.gray() plt.subplot(141) plt.title('Original image') plt.imshow(img) plt.axis('off') plt.subplot(142) plt.title('Keep 1 coefficient') plt.imshow(Compress(img,z_scan_mask(1,8),8)) plt.axis('off') plt.subplot(143) plt.title('Keep 3 coefficients') plt.imshow(Compress(img,z_scan_mask(3,8),8)) plt.axis('off') plt.subplot(144) plt.title('Keep 10 coefficients') plt.imshow(Compress(img,z_scan_mask(10,8),8)) plt.axis('off') plt.show()
Enter fullscreen mode Exit fullscreen mode
For the images above, the more coefficients you retain, the better the image quality. In the DCT matrix, the top-left corner’s DC component contains most of the information. If you only keep the DC component from the top-left corner, the entire block will be represented by the DC component alone, resulting in the entire block having a single value. By retaining more coefficient information, more cosine components are overlaid, resulting in more details in the image.
Quantization
The concept of quantization is similar to the second step. The quantization matrix is user-defined, where smaller quantization coefficients retain more information from the image, while larger quantization coefficients retain less information. For example, consider a 2×2 DCT matrix:
C = [ 211 22 13 5 ] C=\begin{bmatrix} 211 & 22 \\ 13 & 5 \end{bmatrix} C=[21113225]
If the quantization matrix is:
Q = [ 6 6 6 6 ] Q=\begin{bmatrix} 6 & 6 \\ 6 & 6 \end{bmatrix} Q=[6666]
If we divide C C C by Q Q Q , round the resulting values, and then multiply them back by Q Q Q , we can obtain the quantized matrix C q C_q Cq .
C q = [ 210 24 12 0 ] C_q=\begin{bmatrix} 210 & 24 \\ 12 & 0 \end{bmatrix} Cq=[21012240]
Indeed, we can observe that the value 5 in
C C C becomes 0 after quantization. If we reduce
Q Q Q , for example, to 5, the value 5 will not be discarded. Conversely, if we increase
Q Q Q , it is possible that higher values, such as 13, may also be discarded. In software applications like Photoshop and other image processing tools, adjusting the “quality” setting corresponds to modifying the quantization coefficients. The quantization matrix can have different values at each position.
# Define quantization matrixq_mat=np.array([[16,11,10,16,24,40,51,61],[12,12,14,19,26,58,60,55],[14,13,16,24,40,57,69,56],[14,17,22,29,51,87,80,62],[18,22,37,56,68,109,103,77],[24,35,55,64,81,104,113,92],[49,64,78,87,103,121,120,101],[72,92,95,98,112,100,103,99]])# Quantizationdef Compress_2(img,N):img_dct=np.zeros((img.shape[0]//N*N,img.shape[1]//N*N))for m in range(0,img_dct.shape[0],N):for n in range(0,img_dct.shape[1],N):block=img[m:m+N,n:n+N]# DCTcoeff=cv2.dct(block)# Quantizationq_coeff=np.around(coeff*255/q_mat)# Inverse quantizationiq_coeff=q_coeff*q_mat# IDCTiblock=cv2.idct(iq_coeff)img_dct[m:m+N,n:n+N]=iblockreturn img_dct# Show Imagesplt.figure(figsize=(8,4))plt.gray()plt.subplot(121)plt.title('Original')plt.imshow(img)plt.axis('off')plt.subplot(122)plt.title('Quantified')plt.imshow(Compress_2(img,8))plt.axis('off')plt.show# Define quantization matrix q_mat=np.array([[16,11,10,16,24,40,51,61], [12,12,14,19,26,58,60,55], [14,13,16,24,40,57,69,56], [14,17,22,29,51,87,80,62], [18,22,37,56,68,109,103,77], [24,35,55,64,81,104,113,92], [49,64,78,87,103,121,120,101], [72,92,95,98,112,100,103,99]]) # Quantization def Compress_2(img,N): img_dct=np.zeros((img.shape[0]//N*N,img.shape[1]//N*N)) for m in range(0,img_dct.shape[0],N): for n in range(0,img_dct.shape[1],N): block=img[m:m+N,n:n+N] # DCT coeff=cv2.dct(block) # Quantization q_coeff=np.around(coeff*255/q_mat) # Inverse quantization iq_coeff=q_coeff*q_mat # IDCT iblock=cv2.idct(iq_coeff) img_dct[m:m+N,n:n+N]=iblock return img_dct # Show Images plt.figure(figsize=(8,4)) plt.gray() plt.subplot(121) plt.title('Original') plt.imshow(img) plt.axis('off') plt.subplot(122) plt.title('Quantified') plt.imshow(Compress_2(img,8)) plt.axis('off') plt.show# Define quantization matrix q_mat=np.array([[16,11,10,16,24,40,51,61], [12,12,14,19,26,58,60,55], [14,13,16,24,40,57,69,56], [14,17,22,29,51,87,80,62], [18,22,37,56,68,109,103,77], [24,35,55,64,81,104,113,92], [49,64,78,87,103,121,120,101], [72,92,95,98,112,100,103,99]]) # Quantization def Compress_2(img,N): img_dct=np.zeros((img.shape[0]//N*N,img.shape[1]//N*N)) for m in range(0,img_dct.shape[0],N): for n in range(0,img_dct.shape[1],N): block=img[m:m+N,n:n+N] # DCT coeff=cv2.dct(block) # Quantization q_coeff=np.around(coeff*255/q_mat) # Inverse quantization iq_coeff=q_coeff*q_mat # IDCT iblock=cv2.idct(iq_coeff) img_dct[m:m+N,n:n+N]=iblock return img_dct # Show Images plt.figure(figsize=(8,4)) plt.gray() plt.subplot(121) plt.title('Original') plt.imshow(img) plt.axis('off') plt.subplot(122) plt.title('Quantified') plt.imshow(Compress_2(img,8)) plt.axis('off') plt.show
Enter fullscreen mode Exit fullscreen mode
暂无评论内容