I work regularly with HLS and from what I've seen, all TS segments on all live platforms always start with a key frame (I-frame).
Every segment starts with an I-frame
So, I was doing a little bit of reading and found that this requirement that every segment starts with a key frame is because when a viewer tunes in, the player can only start playing at a key frame because it has none of the previous frames to predict and decode if the segment were to start with a P-frame or a B-frame.
Input factors
And typically in a live stream input, a content producer usually decides 3 key factors
- Segment Duration (in seconds typically)
- Key Frame interval (in number of frames typically)
- Frame rate (in frames/second typically)
While delivering TS segments for all the requested renditions (Resolution, bit rate etc), the input has to be cut into segment durations of the requested size, honoring the key frame interval at the input frame rate. This seemed to be simple at first thought because all the encoder has to do is deliver one I-Frame every k frames with k being the key frame interval. But, the complexity comes into the picture when you bring in the fact that every segment has to start at with an I-frame.
Granted that the encoding process itself is quite complex, but what got me wondering was the math behind trying to honor the requested key frame interval given the restriction that every segment has to start with an I-frame which is what I'm going to discuss here.
Simple example
Let's say we had a request for the following.
- 6s segment duration
- 60 frame key frame interval
- 30 frames/s
Cutting segments would be very simple for the above example because all that has to be done is to introduce I-Frames every 2s and the new segment would just start with an I-frame.
This is because the number of frames in a 6s segment would be 180 at 30 fps
And 180 is divisible by 60. Hence, there would be 3 I-frames in every segment.
Slightly complex exmple
However, if the input were to be like the following
- 10s segment duration
- 60 frame key frame interval
- 25 frames/s
There are 10 * 25 = 250 frames in a segment.
But, 250 is not evenly divisible by 60. Hence, the requested inputs cannot be honored.
My questions at this point were basically the following
- What is the math that would figure out how to honor such a request?
- Will the math honor the complete request?
- If it doesn't honor the complete request which part of the request will it ignore?
I was able to find the answers to 2 and 3 pretty fast. Basically, the math will not honor the complete request and it basically rejects the requested segment duration of 10s and honors the other requests such as
- Key frame interval
- Every segment starts with a key frame (This is because of all the reasons I stated at the start of this post)
The math behind the segment cuts
The answer to the first question was also pretty simple. But, it took me a while to figure it out and write some code for it. In order to bring out the math, let us define some variables
- Segment duration (s)
- Key Frame Interval (k)
- Frame rate (f)
We are now trying to figure out the math in a case where we have realized that we cannot honor the requested segment duration. However, we could still honor stream time every x segments by creating a0 segments of segment duration t0 and a1 segments of segment duration t1. This is basically represented as
a0 * t0 + a1 * t1 = s * (a0 + a1) where a0 + a1 = x
So, basically this equations says that if you as a user requested 10s segment durations, we will create segments of duration t0 seconds and t1 seconds. By creating a0 and a1 of such segments respectively, even though we might not honor the request of s seconds, we will honor the request of s * x seconds for every x segments.
Also, we know that by the above mentioned conditions, the number of frames in every such segment should divisible by k, the key frame interval. This gives us
(f * t0) % k = 0 and (f * t1) % k = 0
Evaluating the math for the above example
We have
- s = 10
- k = 60
- f = 25
So, f * s = 250
In order to find t0 and t1, we need to find the closest whole numbers to 250, divisible by 60.
How we do that is basically find the following
f' % k = 0 where f' < (f * s) and
f" % k = 0 where f" > (f * s)
Basically that means f' = 240 and f" = 300 for the above example
Hence, t0 = f'/k = 240/25 = 9.6 and t1 = f"/k = 300/25 = 12.
Now, all we have to do is, find a0 and a1. For this we will start with setting a0 = 1 and finding a1.
If a1 is not a whole number, we increment a0 and find a1. We repeat this until a1 is a whole number. We are doing this until it is a whole number because we can't have a fractional segment of duration 9.6s. We'll need to have a whole segment of the said duration.
The above procedure will give us a0 = 5 and a1 = 1.
So, if you put it all together
a0 * t0 + a1 * t1 = 9.6 * 5 + 12 * 1 = 6 * 10
Conclusion
This basically means that we'll produce 60s of content every 6 segments even though for every 1 segment we basically only produce 9.6s to start with. It's basically an algorithm that helps segments catch up to the promised time within a window of segments.
I wrote some code for the above and has done pretty well with a few sample inputs. If you find something that I've inaccurately described, I would like to hear what you have to say.
Comments
Post a Comment