1 00:00:00,000 --> 00:00:26,000 Hey guys, welcome to the show. Today, if you haven't heard the ring, there's a new quantization algorithm coming out, and it's called Turbo Qantas, and it's from the people from New York, New York, New York University, and Gugel, and they have come up with a technique which is going to smash the ability for us to run full, full, full models at a quantized version, but it's going to be like 100% precision or 99 point, whatever it is. 2 00:00:26,000 --> 00:00:35,000 So, I've already gone on the hero's journey on the struggle, whatever that journey is called, you know, when you get really excited about something, and then, like, you know, I'm at the stage of, you know, disillusionment. 3 00:00:35,000 --> 00:00:44,000 But I'm going to go through, I'm going to show you running live, I'm going to show you all the different implementations if you want to play along at home, and I'll also show you the results that I have got. 4 00:00:44,000 --> 00:00:55,000 And you'll see for yourself, if there is amazingness in this, now there's already some controversy inside this technique, if you go into the feedback forums, basically, if you look at their paper, 5 00:00:55,000 --> 00:00:58,000 they had a little dis about rabbit queue. 6 00:00:58,000 --> 00:01:00,000 I love the name of this situation. 7 00:01:00,000 --> 00:01:10,000 They had a dis with these guys, and then rabbit queue came along, and they said, "Actually, you guys were asking us on how to implement our source code, and you didn't give us credit." 8 00:01:10,000 --> 00:01:16,000 So, it says Turbo Comp describes rabbit queue guarantees a sub-optimal, ooh, academic dis. 9 00:01:16,000 --> 00:01:25,000 Magid's January 2025 emails show that he had transferred our C++ implementation into Python and asked us to help debug it. 10 00:01:25,000 --> 00:01:30,000 So, there's a bit of a cat fight going on here, but nonetheless, let's just check out the technique for itself. 11 00:01:30,000 --> 00:01:38,000 So, it's a two-pass situation, and it works with, like, four bits unless they go down to two bits, which is amazing if it works. 12 00:01:38,000 --> 00:01:46,000 And they use one of the bits to do quantized JL, JL is, I never forget, I never remember his name, Johnson, Linda Strass. 13 00:01:46,000 --> 00:01:56,000 They quantize that, that's one of the bits goes there, and the other bits that you give it, either one bit, two bit or three bits, goes to MSC-optimal quantizers. 14 00:01:56,000 --> 00:02:07,000 And if you do want to play along, actually, if you go into MLX/LM, you want to pull requests over here, there is actually two implementations of this. 15 00:02:07,000 --> 00:02:12,000 So, there's Add Turbo Quant and Add Turbo Quant KV, and there's also a third one. 16 00:02:12,000 --> 00:02:16,000 If you go to VLM, MLX/VLM, there's a third one there. 17 00:02:16,000 --> 00:02:17,000 I tried them all out. 18 00:02:17,000 --> 00:02:22,000 Now, these implementations, both of them, they only do the first part, but don't worry, we've also done the two-pass situation. 19 00:02:22,000 --> 00:02:24,000 We're going to show you all that, that you're working. 20 00:02:24,000 --> 00:02:27,000 This one here, experimental, this one is the slow version. 21 00:02:27,000 --> 00:02:32,000 It does it all on the CPU, but it's Pythonic code, so it's easy to follow along to find out exactly what's going on. 22 00:02:32,000 --> 00:02:41,000 This version here, this one starts to put it in the shaders, the metal shaders, except it is, it actually uses more memory than it claims to, 23 00:02:41,000 --> 00:02:48,000 because it actually stores the full flow of 16 results in the memory as well, so you're actually saving in a memory. 24 00:02:48,000 --> 00:02:53,000 But it's still good to follow along, because you can actually see the pipeline between quantizing and the un... 25 00:02:53,000 --> 00:02:57,000 between pipeline between quantizing and then de-quantizing and all that kind of stuff. 26 00:02:57,000 --> 00:03:00,000 So, if you want to play along, you can check out the pull requests here as well. 27 00:03:00,000 --> 00:03:05,000 And also, I've got it implemented into inference over here, I will show you all the different results and everything. 28 00:03:05,000 --> 00:03:12,000 So, if you go into the settings tab and in model context precision, so you've got the normal 16-bit, which is unquantized, 29 00:03:12,000 --> 00:03:18,000 you've got 9-bit, that's near losses, I'll show you exactly what that means, and it goes down to all the way to 3.5-bit, 30 00:03:18,000 --> 00:03:23,000 and then we've got the 4-bit turbo, 3-bit turbo and 2-bit turbo, and you can choose a different quantization level. 31 00:03:23,000 --> 00:03:28,000 So, the first thing I'll do, since I've ran all these experiments here, I will show you the first version here. 32 00:03:28,000 --> 00:03:31,000 So, we're producing 1,000 tokens. 33 00:03:31,000 --> 00:03:34,000 So, I did an interesting test. 34 00:03:34,000 --> 00:03:39,000 I didn't just blindly say write some code or produce 100 tokens or anything like that. 35 00:03:39,000 --> 00:03:45,000 I actually gave it a follow-on chunky piece of source code that's already been previously generated, 36 00:03:45,000 --> 00:03:48,000 because I wanted to fill up this context window, and I asked it to make a change, 37 00:03:48,000 --> 00:03:51,000 asking it to add a spaceship in the first person's perspective. 38 00:03:51,000 --> 00:03:56,000 So, it's got to manage that context window, quantize it down, and hopefully we'll get some good needle and haystacks, 39 00:03:56,000 --> 00:03:59,000 but if we'll get the right functions, we're going to see what happens there. 40 00:03:59,000 --> 00:04:02,000 So, if you run it with 1,000 tokens unquantized, 41 00:04:02,000 --> 00:04:07,000 we're getting a requirement of 1.92 jibs of memoryandials. 42 00:04:07,000 --> 00:04:15,000 When we quantize it with 9-bits, 9-bit positions, we half that memory to about 1.1 gibs. 43 00:04:15,000 --> 00:04:21,000 When you quantize it down to 4.5 bits, we're now at 0.5 gibs, 44 00:04:21,000 --> 00:04:24,000 and in comparison, when you quantize it down to 4-bit turbo, 45 00:04:24,000 --> 00:04:26,000 you actually use a slightly less 0.51. 46 00:04:26,000 --> 00:04:30,000 You can actually even make it use less, but I wanted to keep the precision, 47 00:04:30,000 --> 00:04:34,000 so I kept my floats at float32, you can actually make it float16s, 48 00:04:34,000 --> 00:04:38,000 and there's also some int h you can utilize in there if you want to play along at home. 49 00:04:38,000 --> 00:04:43,000 But so, yeah, you've got better memory savings than the 4.5-bit quant, 50 00:04:43,000 --> 00:04:48,000 and you can even go down further, you can go to 3.5 bits, and that's 0.4 gibs, 51 00:04:48,000 --> 00:04:52,000 and if you can go to 3-bit turbo, and that's 0.4, so still it's slightly less, 52 00:04:52,000 --> 00:04:54,000 and you can even go down to 2-bit turbos. 53 00:04:54,000 --> 00:04:57,000 But what I want to show you, it produces code nonetheless, 54 00:04:57,000 --> 00:05:00,000 but let's just see what it actually looks like the code that gets produced. 55 00:05:00,000 --> 00:05:06,000 So, if we run it full precision over here, we got the Earth still coherent, 56 00:05:06,000 --> 00:05:09,000 we now got a spaceship over here, we got an asteroid impact, 57 00:05:09,000 --> 00:05:11,000 and it creates a ring in a random location. 58 00:05:11,000 --> 00:05:13,000 I'm running minimax over here in this example. 59 00:05:13,000 --> 00:05:16,000 When I do enter the spaceship, however, this is the full precision, 60 00:05:16,000 --> 00:05:21,000 it gives me some baffling controls, so it's first-person, 61 00:05:21,000 --> 00:05:26,000 and not third-person, so this stuff happens just based on the random token selection, 62 00:05:26,000 --> 00:05:29,000 so just random stuff can happen depending on the model that you choose, 63 00:05:29,000 --> 00:05:32,000 but at least you got an idea of what the full version looks like. 64 00:05:32,000 --> 00:05:36,000 When we do 9-bit precision, now you got the Earth, you can still see the Earth, 65 00:05:36,000 --> 00:05:39,000 and if you want to get some context on these demonstrations, 66 00:05:39,000 --> 00:05:43,000 I actually ran these tests with context precision earlier in the video, 67 00:05:43,000 --> 00:05:47,000 so you can check it out and find out the loss and stuff for how we got to the stage, 68 00:05:47,000 --> 00:05:50,000 so this is kind of like a part 2 using the turbo concentration. 69 00:05:50,000 --> 00:05:53,000 So 9-bit is cool over here, but look at that, we got a nice spaceship, 70 00:05:53,000 --> 00:05:56,000 we can fly around, so we still got a good context. 71 00:05:56,000 --> 00:05:58,000 We're going to go in and show you the actual fidelity 72 00:05:58,000 --> 00:06:01,000 and exactly the complexity towards the end of this video, 73 00:06:01,000 --> 00:06:04,000 but let me just show you real-life results that you're going to get. 74 00:06:04,000 --> 00:06:07,000 So that's 9-bit, 8.5, you still get good results, 75 00:06:07,000 --> 00:06:11,000 triggering an asteroid that still works, although it don't happen there. 76 00:06:11,000 --> 00:06:13,000 Boom, we still get the ring, exact same situation. 77 00:06:13,000 --> 00:06:17,000 6.5, again, it still works, except look at the spaceships slightly different, 78 00:06:17,000 --> 00:06:22,000 but it could just be just randomness. 4.5.5 is still good, 79 00:06:22,000 --> 00:06:30,000 4.5 is still good, and then when we get down to 3.5, 3.5 bits, 80 00:06:30,000 --> 00:06:35,000 that's when we start to run into runtime errors and kind of like breaks away. 81 00:06:35,000 --> 00:06:40,000 So switching over to sort of 4-bit turbo quant, so again, you got the beautiful Earth, 82 00:06:40,000 --> 00:06:44,000 so it hasn't lost that fidelity, and we can enter the flight, 83 00:06:44,000 --> 00:06:48,000 and to look at this, this is just a beautiful spaceship. 84 00:06:48,000 --> 00:06:51,000 I've got to say, it's a very, very nice-looking spaceship. 85 00:06:51,000 --> 00:06:54,000 Like, you can see the shadows, the lighting on it, 86 00:06:54,000 --> 00:06:57,000 it looks very beautiful, the spaceships are. 87 00:06:57,000 --> 00:07:00,000 I've got to say this looks really, really good 4-bit turbo quant, 88 00:07:00,000 --> 00:07:04,000 3-bit turbo quant, you've now lost context on the actual Earth. 89 00:07:04,000 --> 00:07:08,000 So previously when we did 3.5, we started to get in runtime errors. 90 00:07:08,000 --> 00:07:12,000 Here, we actually did get compile errors, so it couldn't compile the metal shader, 91 00:07:12,000 --> 00:07:14,000 and we lost context on the Earth. 92 00:07:14,000 --> 00:07:16,000 So there is a technique that also threw in there, 93 00:07:16,000 --> 00:07:20,000 and it's also written in a paper, and it's mixed precision quantization, 94 00:07:20,000 --> 00:07:25,000 and what they do is, depending on the position required for the layer, 95 00:07:25,000 --> 00:07:28,000 they increase the quantization or decrease quantization. 96 00:07:28,000 --> 00:07:31,000 So I just did a very, very basic one, where I think the first two layers 97 00:07:31,000 --> 00:07:33,000 and the last two layers, they're the most important, 98 00:07:33,000 --> 00:07:35,000 especially when you go into quantization, all that stuff. 99 00:07:35,000 --> 00:07:40,000 I kept that always at 4-bit fidelity, and when I do that with just the standards 3.5 bits, 100 00:07:40,000 --> 00:07:43,000 the affine quantization, again, we still don't have the Earth, 101 00:07:43,000 --> 00:07:45,000 and it's still getting a runtime error. 102 00:07:45,000 --> 00:07:50,000 However, with 3-bit turbo quantings, this is just the one-part solution at the moment. 103 00:07:50,000 --> 00:07:53,000 We now have the Earth, and we now got a spaceship, 104 00:07:53,000 --> 00:07:56,000 and look, we can fly around in a spaceship. 105 00:07:56,000 --> 00:08:00,000 So we've definitely, unless we got lucky, well, you can see the results for itself. 106 00:08:00,000 --> 00:08:03,000 We definitely did get some improvement, okay? 107 00:08:03,000 --> 00:08:07,000 And when it comes to 2-bit precision, we completely fail, and it's completely awful. 108 00:08:07,000 --> 00:08:10,000 So that is the situation as it is. 109 00:08:10,000 --> 00:08:14,000 Now, I'm going to show you it running in case you're a bit unsure of what is the situation. 110 00:08:14,000 --> 00:08:17,000 Our 4-bit turbo, I'm just going to run that. 111 00:08:17,000 --> 00:08:19,000 So you can see here that it's producing tokens. 112 00:08:19,000 --> 00:08:21,000 It's got around 13 tokens a second. 113 00:08:21,000 --> 00:08:24,000 Now, this is a big context window around 8,000 tokens, 114 00:08:24,000 --> 00:08:29,000 and this version can be sped up, so right now we're de-quantizing it every single layer 115 00:08:29,000 --> 00:08:32,000 to be passed into the normal dot product. 116 00:08:32,000 --> 00:08:34,000 Now, we can fuse that into a metal shader and make it run faster, 117 00:08:34,000 --> 00:08:36,000 but that is the situation as it is today. 118 00:08:36,000 --> 00:08:40,000 And if we go into the token inspector, so let's just get a bit of entropy here. 119 00:08:40,000 --> 00:08:45,000 We can see that constructed from the original version here, the fully unquantized version. 120 00:08:45,000 --> 00:08:48,000 We can see that here is the updated code with controllable spaceship. 121 00:08:48,000 --> 00:08:50,000 Here is the updated code with controllable spaceship. 122 00:08:50,000 --> 00:08:54,000 I have added a spaceship model constructed from geometries. 123 00:08:54,000 --> 00:08:58,000 So instead of it going for geometries, that was the top token you can see here. 124 00:08:58,000 --> 00:09:01,000 Originally with the unquantized version, we're 41% going to there. 125 00:09:01,000 --> 00:09:05,000 We see that when we did the 4-bit quantization, geometries now isn't the top token anymore. 126 00:09:05,000 --> 00:09:10,000 It's now the second top token at 32%, whereas previously it's 41% and geometric. 127 00:09:10,000 --> 00:09:14,000 That one is now 41, so they've kind of like swapped over there. 128 00:09:14,000 --> 00:09:19,000 Just due to that precision loss situation, so you can see that these kind of errors. 129 00:09:19,000 --> 00:09:23,000 And it also depends on the seed as well, although I am running this greedy 130 00:09:23,000 --> 00:09:28,000 with temperature set to zero to try making it as precise and reproducible as possible. 131 00:09:28,000 --> 00:09:33,000 But you can see that it starts to drift, so you can see the controls is one of the top lines here, 132 00:09:33,000 --> 00:09:36,000 whereas HTML says DocType is at the top there there. 133 00:09:36,000 --> 00:09:41,000 So it starts writing the code when it has that full precision, but at turbo 4, 134 00:09:41,000 --> 00:09:43,000 that's when you start getting differences. 135 00:09:43,000 --> 00:09:47,000 So to find exactly what's going on, it's tracking the precision loss in between the quantized 136 00:09:47,000 --> 00:09:50,000 the quantized version versus the full precision one. 137 00:09:50,000 --> 00:09:53,000 So I have a custom version where I'm monitoring the situation. 138 00:09:53,000 --> 00:09:56,000 And with 16-bit, there is no loss whatsoever. 139 00:09:56,000 --> 00:09:58,000 So it's 100% and zero, zero. 140 00:09:58,000 --> 00:10:02,000 With nine bits, so this is the highest 8-bit you can get with group sizes of 32. 141 00:10:02,000 --> 00:10:08,000 Again, we have the exact same cosine similarity, and the mean absolute error is 0.09. 142 00:10:08,000 --> 00:10:10,000 So that's very, very, very low. 143 00:10:10,000 --> 00:10:15,000 When we go down to 4.5-bit, the mean absolute error is 0.15. 144 00:10:15,000 --> 00:10:18,000 So that's pretty high compared to 0.09. 145 00:10:18,000 --> 00:10:20,000 And we see the results very, very similar. 146 00:10:20,000 --> 00:10:26,000 Now, when it comes to 4-bit turbo, this is just using the first pass, the MSE pass, 147 00:10:26,000 --> 00:10:31,000 we actually get a lower mean absolute error than the 4.5-bit. 148 00:10:31,000 --> 00:10:35,000 But it's still higher than 5.5-bit. 149 00:10:35,000 --> 00:10:39,000 So it says it's like a near lossless quantization. 150 00:10:39,000 --> 00:10:46,000 If you're looking at the actual fidelity, the bits, the actual values inside the cache, 151 00:10:46,000 --> 00:10:52,000 you can see that it's not as lossless as you can imagine, like 9-bit is 0.09. 152 00:10:52,000 --> 00:10:55,000 Here, 4-bit turbo is 0.12. 153 00:10:55,000 --> 00:10:58,000 And with 3-bit, that's 0.23. 154 00:10:58,000 --> 00:11:02,000 So that's insanely high, and that 2-bit is just 0.5. 155 00:11:02,000 --> 00:11:05,000 That's just a massive, massive variance of the situation. 156 00:11:05,000 --> 00:11:09,000 But again, this measures the actual fidelity, what is in the actual bits, 157 00:11:09,000 --> 00:11:11,000 and the deviation between the bits. 158 00:11:11,000 --> 00:11:12,000 So we did one better. 159 00:11:12,000 --> 00:11:14,000 We looked at the actual perplexity of the end results. 160 00:11:14,000 --> 00:11:18,000 You know, so when I was showing you over here that the top token changed, 161 00:11:18,000 --> 00:11:22,000 so the top token is now geometric instead of it being geometries. 162 00:11:22,000 --> 00:11:24,000 So that's the measurement of perplexity. 163 00:11:24,000 --> 00:11:27,000 That's what measures the confidence levels of the top token. 164 00:11:27,000 --> 00:11:30,000 And these are the results we got with 500 tokens. 165 00:11:30,000 --> 00:11:35,000 So with the full one, you get a perplexity of 0.07 and top token accuracy 166 00:11:35,000 --> 00:11:39,000 produced the exact same tokens as it would because, you know, we got the greedy, 167 00:11:39,000 --> 00:11:43,000 we got the temperature set to 0 on the seed, set to 0, so it's all perfect. 168 00:11:43,000 --> 00:11:49,000 With 9-bit quantization on the context, again, in top token accuracy of 99.6%, 169 00:11:49,000 --> 00:11:52,000 a very, very high, we only missed two tokens. 170 00:11:52,000 --> 00:11:53,000 So we only missed two tokens. 171 00:11:53,000 --> 00:11:55,000 The divergence was very, very low. 172 00:11:55,000 --> 00:11:58,000 With traditional four-bit quantization, affine quantization. 173 00:11:58,000 --> 00:12:01,000 The flex stays now 1.117, so it's going to hire. 174 00:12:01,000 --> 00:12:04,000 And we've dropped to 97.2 top token accuracy. 175 00:12:04,000 --> 00:12:10,000 So only was it 14 of the tokens that the 500 tokens were the wrong ones picked. 176 00:12:10,000 --> 00:12:13,000 But when the wrong ones were picked, they were divided by 30%. 177 00:12:13,000 --> 00:12:19,000 Now, when it comes to four-bit turbo, with the one pass with MSC, 178 00:12:19,000 --> 00:12:26,000 I was getting a perplexity pretty much slightly lower than the four-bit affine quantization. 179 00:12:26,000 --> 00:12:27,000 So slightly lower. 180 00:12:27,000 --> 00:12:31,000 So this is 1.1178, and this is 1.1171. 181 00:12:31,000 --> 00:12:36,000 Now, the model I'm running here is llama1b because it runs really fast and gets some results, 182 00:12:36,000 --> 00:12:38,000 and I have to wait forever. 183 00:12:38,000 --> 00:12:43,000 Maybe on a different model, it might run better, but I'm just showing you what I've got. 184 00:12:43,000 --> 00:12:47,000 Now, I also modified it to actually include the second pass. 185 00:12:47,000 --> 00:12:49,000 The quantized Johnson Linda Strass. 186 00:12:49,000 --> 00:12:50,000 I love that name. 187 00:12:50,000 --> 00:12:51,000 It's very, very good. 188 00:12:51,000 --> 00:12:56,000 Now, the problem with that one is, rather than giving the four-bit to the MSC to get highest 189 00:12:56,000 --> 00:13:01,000 precision, you actually give one of the bits, you sacrifice it over for a second quantization 190 00:13:01,000 --> 00:13:06,000 at one bit, and that's how they have error corrected and got their amazing results. 191 00:13:06,000 --> 00:13:12,000 But when I did that, the perplexity shot up to 1.5, and the top token accuracy went to 88.2. 192 00:13:12,000 --> 00:13:13,000 So it's worse. 193 00:13:13,000 --> 00:13:15,000 It was worse for me. 194 00:13:15,000 --> 00:13:18,000 If anyone's interested, I can share the code that I'm running. 195 00:13:18,000 --> 00:13:21,000 But again, I've already showed you the root code if you want to check it out. 196 00:13:21,000 --> 00:13:25,000 So there's three versions you can look at, already open source. 197 00:13:25,000 --> 00:13:30,000 And yeah, so we'll see what happens when it comes to three bits. 198 00:13:30,000 --> 00:13:35,000 And this version is actually using the MSC position that we talked about with the first two 199 00:13:35,000 --> 00:13:40,000 and the last two are at four bits because that's the only one that actually yielded some results when we ran it. 200 00:13:40,000 --> 00:13:45,000 So this one, you know, 95% token accuracy, so that's good with the one pass. 201 00:13:45,000 --> 00:13:50,000 And that is higher than the two pass, but it's still lower than the one pass four-bit, 202 00:13:50,000 --> 00:13:53,000 and it's still lower than the traditional affine quantization of four bits. 203 00:13:53,000 --> 00:13:58,000 And with two passes, the perplexity just shoots up to 15.38. 204 00:13:58,000 --> 00:14:04,000 So I'm hoping that, like, I'm currently at the bottom of the hill here. 205 00:14:04,000 --> 00:14:09,000 I've been battling this for two days, just doing all sorts of experiments, just trying to get this nailed. 206 00:14:09,000 --> 00:14:15,000 And that's where I am. Let me know what you guys are finding with this quantization technique, 207 00:14:15,000 --> 00:14:19,000 because, yeah, I don't know how they're getting their results. 208 00:14:19,000 --> 00:14:27,000 And their results, just look at here, the full precision scores at 99.7%, and they score at 99.7%. 209 00:14:27,000 --> 00:14:34,000 And they say all of the other ones are rubbish, and their one is the best. It is lossless, the same score. 210 00:14:34,000 --> 00:14:41,000 They're also running Lama 3.18b. That's in the needle haystack test, so their results are spectacular. 211 00:14:41,000 --> 00:14:48,000 But with that being said, just remember that this generation that we made here with the four-bit turbo coolant, 212 00:14:48,000 --> 00:14:54,000 let's just get it there. Look, the asteroid still worked, so it is exciting times over there, 213 00:14:54,000 --> 00:14:57,000 and it's going to be really fun to see what other quantization techniques happen, 214 00:14:57,000 --> 00:15:00,000 and a little bit more information about maybe the model that they use. 215 00:15:00,000 --> 00:15:05,000 This is Minimax and Lama 1, that's the ones I tried out with. Let me know what models you guys have tried at home. 216 00:15:05,000 --> 00:15:09,000 And if you guys remember, if you want to play along, check out this is MLX LM. 217 00:15:09,000 --> 00:15:15,000 You've got two versions over here, hyphenic one, very, very easy to follow, and a turbo quant, that one over here. 218 00:15:15,000 --> 00:15:19,000 But using one memory, you can still follow along, so it's very, very good. That is the first pass. 219 00:15:19,000 --> 00:15:29,000 And there's also, if you go to MLX VLM, that's made by this genius kit called Prince Blazi. 220 00:15:29,000 --> 00:15:32,000 He's awesome, and he's also got a version for it, you can check it out. 221 00:15:32,000 --> 00:15:35,000 I tried it out, it didn't really work too well, so let me know. 222 00:15:35,000 --> 00:15:41,000 Is turbo quant, going to be the hero of quantization? What I do, actually love about this, I'm going to say it. 223 00:15:41,000 --> 00:15:46,000 Looking at their paper, I'm looking forward to trying out the other methods as well to find out exactly what's going on. 224 00:15:46,000 --> 00:15:50,000 So that's really cool. Let me know what you guys, so let me know what you guys think. 225 00:15:50,000 --> 00:15:53,000 Hope you guys found this video useful and enjoyed the show. 226 00:15:59,000 --> 00:16:01,500 (baby crying)