You will reap

Through this series of courses, the audience is expected to understand the overall development of video compression coding technology, be proficient in the framework and details of H.264 video coding technology, and lay a solid foundation for further research on h.265/hevc coding standard, audio and video streaming media, video live on demand and other technologies. This series of courses is suitable for multimedia developers and senior undergraduate, master's and junior doctoral students with audio and video coding as their main research direction. It can not only contribute to the development of engineering technology, but also provide assistance to the audience in scientific research.

intended for

Computer related majors

Course introduction

H. 264 / AVC is a widely used video compression coding standard in the industry, including advanced and mature video coding technology. This course will describe in detail the overall architecture and technical details of h.264/avc video coding standard from the perspectives of principle, standard and implementation. It will not only explain the contents of h.264/avc standard protocol documents, but also help the audience understand the principle of H.264 coding standard more deeply through the development of actual H.264 bitstream analysis / decoding program.

Course catalogue

Discussion message


Classmate notes

  • snowwood 2021-01-23 09:41:13

    Source: get the location and validity of adjacent blocks View details

    The layout of the above code is a little verbose. Adjust it, as shown below:

  • baiyun0451c 2020-03-24 11:27:34

    Source: implementation index Columbus code View details

    Huffman code:    According to the probability, you need to know the code table

    Exponential Columbus code   

    [prefix] + 1 + [surfix]

    Prefix is a continuous 0. The number of 0 determines the number of digits of Surfix


    UE unsigned actual number codenum = 2 * * ([prefix] digits) - 1 + [Surfix]

    Se signed and unsigned codenum appear alternately {0, 1, - 1, 2, - 2...}

    Se = (-1)**(k+1) * ceil(k/2).    Where k is the unsigned codenum


    Te Code:   Value range [0, x]    X > 1, equivalent to UE     If x = 1, it is the negation of the next bit

    Me uses table lookup



    The exponential Columbus code changes the value into bit misalignment, corresponding to the RBSP above

  • baiyun0451c 2020-03-24 10:50:50

    Source: interaction between people and the world, meaning of video information View details

    Htwo hundred and sixty-four


    Intra compression:

    1. Video prediction is adopted   16x16 and 4x4 are divided in various forms. 16 * 16 has 4 prediction modes and 4 * 4 has 9 prediction modes
    2. Select the best prediction mode, and then save the image and prediction mode after residual quantization
    3. I can also be used_ PCM (without any prediction mode)
    4. The residual image is transformed by DCT, and CABAC is used for lossless compression


    Inter prediction:

    1. Video prediction is adopted   16x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x4 multiple forms of segmentation
    2. (GOP) frame grouping algorithm is:In several adjacent images, generally, the pixels with difference are only points within 10%, the change of brightness difference is no more than 2%, and the change of chroma difference is only within 1%, we believe that such graphs can be divided into groups.
    3. For the front and back frames, calculate the block closest to the block. As in the frame, DCT transform, quantization and lossless compression are performed on the residual



    DCT transform the blocks (usually residuals), fine quantization at low frequency and coarse quantization at high frequency

    Motion compensation cuts the image into blocks, and then calculates the residual of the image blocks before and after the frames.


    In practical application, the residual image is transformed and quantized by DCT, then some are encoded by entropy, and some are inverse quantized and inverse DCT transformed, which are added with the motion compensation image to obtain a new prediction image.




    Nalu in 0x00000001 (0x000001)   The start (annex-b), that is, the segmentation header, some of which are directly the length of the frame (RTP) (very rare)

    Then comes the Nalu head,


          | 0|1|2|3|4|5|6|7|


          | F|NRI|   Type    |



    NRI   importance


    Significance of header value:

    0x01        B frame

    0x61        P frame

    0x65 .      I frame

    0x67.       sps

    0x68.       pps



    First floor EBSP     Extended byte sequence load

    If there are two 00 00 [00 / 01 / 02 / 03], change to 00 00 03 [00 / 01 / 02 / 03]

    After removing the 03 mentioned above, it is RBSP

    Layer 2 RBSP     Original byte sequence load

    If the actual number of bits of the last byte is not enough, fill in several zeros after 1. Is the integer occupied by video not byte?

    The process of removing the last byte is sodb

    Layer 3 sodb    Data byte stream


No more