Doom9's Forum - View Single Post

LoRd_MuldeR · 13th November 2016, 18:52

Quote:

Originally Posted by Ataril

Yes, I've read about DCT and quantization. But what was bothering me is that im not really understood how RGB hex color code transforms into separate frequency values of brightness and color.
Lets say we have a 16x16 block of pixels.
Brightness is calculated first using formula - Y′ = 0.299 R′ + 0.587 G′ + 0.114 B′ ( it is the sum of red, green and blue colors with proper weight coefficients - 0.299 for red, 0.587 for green and 0.114 for blue, according to CCIR 601 standart) and its range is from 16 (black) to 235 (white) and we got matrux of 256 values.
For color blocks obviously we take color value in the range 0-255 for blue and for red, average them (if we use 4:2:0 or 4:2:2 subsampling) and construct two matrixes using following formulas for each chroma value:
Cb = 0.564 (B - Y)
Cr = 0.713 (R - Y)
Before being displayed on the screen the picture should be transformed from YCbCr color space back into habitual RGB color space.

Found the answer here:
https://en.wikipedia.org/wiki/Luma_(video)
And once again reread second chapter in Richardson's book, it has quite detailed information

Again, the input RGB picture is converted to YCbCr format, if not in YCbCr format already. Here you see (top to bottom) the original RGB image and how it's split into Y, Cb and Cr channels:
https://upload.wikimedia.org/wikiped...separation.jpg

Next, each N×N block is transformed from spatial domain into frequency domain - separately for each channel.

In spatial domain, each N×N block consist of N² brightness (luminance) or color (chrominance) values. In frequency domain, the same information is represented by N² frequency coefficients.

For example, when using DCT transform with 8×8 block size, each pixel block will be represented as a linear-combination of the following 64 "patterns" (base functions):
https://upload.wikimedia.org/wikiped...23/Dctjpeg.png

Think of it like this: Each of the 64 frequency coefficients belongs to one of the 64 patterns shown above. You can interpret the coefficient as the "intensity" of the corresponding pattern:
http://img.tomshardware.com/us/1999/...part_3/dct.gif

In order to reconstruct the original 8×8 pixel block, each pattern would be multiplied by the corresponding coefficient (intensity) value and then the results are all added up.

Quote:

Originally Posted by Ataril

And what about codecs that are built in mobile devices? How are they operate? As far as I understand they don't have chance to evaluate video before coding, because it should be performed on the fly (unlike desktop codecs which are evaluate video one or more times to decide how better to destribute bitrate among frames). How they manage unpredictable video? Are they just using some default parameters?

Basically, "hardware" encoders don't work that much different from "software" encoders - only that they are implemented directly in silicium

However, while software encoders are usually highly tweakable, ranging from "ultra fast" encoding (i.e. "quick and dirty" optimization for maximum throughput) to "very slow" encoding (i.e. "thorough" optimization for maximum compression efficiency), hardware encoders tend to be targeted more towards "real-time" encoding rather than maximum compression efficiency. And hardware encoders most often don't provide many options, if at all, to adjust the encoding behavior.

If you do "real-time" encoding, you can not use elaborate optimization techniques, such as "2-Pass" encoding. This applies to both, software and hardware, encoders. However, hardware encoders usually are bound to "real-time" encoding.