1
00:00:00,000 --> 00:00:26,000
 Hey guys, welcome to the show. Today, if you haven't heard the ring, there's a new quantization algorithm coming out, and it's called Turbo Qantas, and it's from the people from New York, New York, New York University, and Gugel, and they have come up with a technique which is going to smash the ability for us to run full, full, full models at a quantized version, but it's going to be like 100% precision or 99 point, whatever it is.

2
00:00:26,000 --> 00:00:35,000
 So, I've already gone on the hero's journey on the struggle, whatever that journey is called, you know, when you get really excited about something, and then, like, you know, I'm at the stage of, you know, disillusionment.

3
00:00:35,000 --> 00:00:44,000
 But I'm going to go through, I'm going to show you running live, I'm going to show you all the different implementations if you want to play along at home, and I'll also show you the results that I have got.

4
00:00:44,000 --> 00:00:55,000
 And you'll see for yourself, if there is amazingness in this, now there's already some controversy inside this technique, if you go into the feedback forums, basically, if you look at their paper,

5
00:00:55,000 --> 00:00:58,000
 they had a little dis about rabbit queue.

6
00:00:58,000 --> 00:01:00,000
 I love the name of this situation.

7
00:01:00,000 --> 00:01:10,000
 They had a dis with these guys, and then rabbit queue came along, and they said, "Actually, you guys were asking us on how to implement our source code, and you didn't give us credit."

8
00:01:10,000 --> 00:01:16,000
 So, it says Turbo Comp describes rabbit queue guarantees a sub-optimal, ooh, academic dis.

9
00:01:16,000 --> 00:01:25,000
 Magid's January 2025 emails show that he had transferred our C++ implementation into Python and asked us to help debug it.

10
00:01:25,000 --> 00:01:30,000
 So, there's a bit of a cat fight going on here, but nonetheless, let's just check out the technique for itself.

11
00:01:30,000 --> 00:01:38,000
 So, it's a two-pass situation, and it works with, like, four bits unless they go down to two bits, which is amazing if it works.

12
00:01:38,000 --> 00:01:46,000
 And they use one of the bits to do quantized JL, JL is, I never forget, I never remember his name, Johnson, Linda Strass.

13
00:01:46,000 --> 00:01:56,000
 They quantize that, that's one of the bits goes there, and the other bits that you give it, either one bit, two bit or three bits, goes to MSC-optimal quantizers.

14
00:01:56,000 --> 00:02:07,000
 And if you do want to play along, actually, if you go into MLX/LM, you want to pull requests over here, there is actually two implementations of this.

15
00:02:07,000 --> 00:02:12,000
 So, there's Add Turbo Quant and Add Turbo Quant KV, and there's also a third one.

16
00:02:12,000 --> 00:02:16,000
 If you go to VLM, MLX/VLM, there's a third one there.

17
00:02:16,000 --> 00:02:17,000
 I tried them all out.

18
00:02:17,000 --> 00:02:22,000
 Now, these implementations, both of them, they only do the first part, but don't worry, we've also done the two-pass situation.

19
00:02:22,000 --> 00:02:24,000
 We're going to show you all that, that you're working.

20
00:02:24,000 --> 00:02:27,000
 This one here, experimental, this one is the slow version.

21
00:02:27,000 --> 00:02:32,000
 It does it all on the CPU, but it's Pythonic code, so it's easy to follow along to find out exactly what's going on.

22
00:02:32,000 --> 00:02:41,000
 This version here, this one starts to put it in the shaders, the metal shaders, except it is, it actually uses more memory than it claims to,

23
00:02:41,000 --> 00:02:48,000
 because it actually stores the full flow of 16 results in the memory as well, so you're actually saving in a memory.

24
00:02:48,000 --> 00:02:53,000
 But it's still good to follow along, because you can actually see the pipeline between quantizing and the un...

25
00:02:53,000 --> 00:02:57,000
 between pipeline between quantizing and then de-quantizing and all that kind of stuff.

26
00:02:57,000 --> 00:03:00,000
 So, if you want to play along, you can check out the pull requests here as well.

27
00:03:00,000 --> 00:03:05,000
 And also, I've got it implemented into inference over here, I will show you all the different results and everything.

28
00:03:05,000 --> 00:03:12,000
 So, if you go into the settings tab and in model context precision, so you've got the normal 16-bit, which is unquantized,

29
00:03:12,000 --> 00:03:18,000
 you've got 9-bit, that's near losses, I'll show you exactly what that means, and it goes down to all the way to 3.5-bit,

30
00:03:18,000 --> 00:03:23,000
 and then we've got the 4-bit turbo, 3-bit turbo and 2-bit turbo, and you can choose a different quantization level.

31
00:03:23,000 --> 00:03:28,000
 So, the first thing I'll do, since I've ran all these experiments here, I will show you the first version here.

32
00:03:28,000 --> 00:03:31,000
 So, we're producing 1,000 tokens.

33
00:03:31,000 --> 00:03:34,000
 So, I did an interesting test.

34
00:03:34,000 --> 00:03:39,000
 I didn't just blindly say write some code or produce 100 tokens or anything like that.

35
00:03:39,000 --> 00:03:45,000
 I actually gave it a follow-on chunky piece of source code that's already been previously generated,

36
00:03:45,000 --> 00:03:48,000
 because I wanted to fill up this context window, and I asked it to make a change,

37
00:03:48,000 --> 00:03:51,000
 asking it to add a spaceship in the first person's perspective.

38
00:03:51,000 --> 00:03:56,000
 So, it's got to manage that context window, quantize it down, and hopefully we'll get some good needle and haystacks,

39
00:03:56,000 --> 00:03:59,000
 but if we'll get the right functions, we're going to see what happens there.

40
00:03:59,000 --> 00:04:02,000
 So, if you run it with 1,000 tokens unquantized,

41
00:04:02,000 --> 00:04:07,000
 we're getting a requirement of 1.92 jibs of memoryandials.

42
00:04:07,000 --> 00:04:15,000
 When we quantize it with 9-bits, 9-bit positions, we half that memory to about 1.1 gibs.

43
00:04:15,000 --> 00:04:21,000
 When you quantize it down to 4.5 bits, we're now at 0.5 gibs,

44
00:04:21,000 --> 00:04:24,000
 and in comparison, when you quantize it down to 4-bit turbo,

45
00:04:24,000 --> 00:04:26,000
 you actually use a slightly less 0.51.

46
00:04:26,000 --> 00:04:30,000
 You can actually even make it use less, but I wanted to keep the precision,

47
00:04:30,000 --> 00:04:34,000
 so I kept my floats at float32, you can actually make it float16s,

48
00:04:34,000 --> 00:04:38,000
 and there's also some int h you can utilize in there if you want to play along at home.

49
00:04:38,000 --> 00:04:43,000
 But so, yeah, you've got better memory savings than the 4.5-bit quant,

50
00:04:43,000 --> 00:04:48,000
 and you can even go down further, you can go to 3.5 bits, and that's 0.4 gibs,

51
00:04:48,000 --> 00:04:52,000
 and if you can go to 3-bit turbo, and that's 0.4, so still it's slightly less,

52
00:04:52,000 --> 00:04:54,000
 and you can even go down to 2-bit turbos.

53
00:04:54,000 --> 00:04:57,000
 But what I want to show you, it produces code nonetheless,

54
00:04:57,000 --> 00:05:00,000
 but let's just see what it actually looks like the code that gets produced.

55
00:05:00,000 --> 00:05:06,000
 So, if we run it full precision over here, we got the Earth still coherent,

56
00:05:06,000 --> 00:05:09,000
 we now got a spaceship over here, we got an asteroid impact,

57
00:05:09,000 --> 00:05:11,000
 and it creates a ring in a random location.

58
00:05:11,000 --> 00:05:13,000
 I'm running minimax over here in this example.

59
00:05:13,000 --> 00:05:16,000
 When I do enter the spaceship, however, this is the full precision,

60
00:05:16,000 --> 00:05:21,000
 it gives me some baffling controls, so it's first-person,

61
00:05:21,000 --> 00:05:26,000
 and not third-person, so this stuff happens just based on the random token selection,

62
00:05:26,000 --> 00:05:29,000
 so just random stuff can happen depending on the model that you choose,

63
00:05:29,000 --> 00:05:32,000
 but at least you got an idea of what the full version looks like.

64
00:05:32,000 --> 00:05:36,000
 When we do 9-bit precision, now you got the Earth, you can still see the Earth,

65
00:05:36,000 --> 00:05:39,000
 and if you want to get some context on these demonstrations,

66
00:05:39,000 --> 00:05:43,000
 I actually ran these tests with context precision earlier in the video,

67
00:05:43,000 --> 00:05:47,000
 so you can check it out and find out the loss and stuff for how we got to the stage,

68
00:05:47,000 --> 00:05:50,000
 so this is kind of like a part 2 using the turbo concentration.

69
00:05:50,000 --> 00:05:53,000
 So 9-bit is cool over here, but look at that, we got a nice spaceship,

70
00:05:53,000 --> 00:05:56,000
 we can fly around, so we still got a good context.

71
00:05:56,000 --> 00:05:58,000
 We're going to go in and show you the actual fidelity

72
00:05:58,000 --> 00:06:01,000
 and exactly the complexity towards the end of this video,

73
00:06:01,000 --> 00:06:04,000
 but let me just show you real-life results that you're going to get.

74
00:06:04,000 --> 00:06:07,000
 So that's 9-bit, 8.5, you still get good results,

75
00:06:07,000 --> 00:06:11,000
 triggering an asteroid that still works, although it don't happen there.

76
00:06:11,000 --> 00:06:13,000
 Boom, we still get the ring, exact same situation.

77
00:06:13,000 --> 00:06:17,000
 6.5, again, it still works, except look at the spaceships slightly different,

78
00:06:17,000 --> 00:06:22,000
 but it could just be just randomness. 4.5.5 is still good,

79
00:06:22,000 --> 00:06:30,000
 4.5 is still good, and then when we get down to 3.5, 3.5 bits,

80
00:06:30,000 --> 00:06:35,000
 that's when we start to run into runtime errors and kind of like breaks away.

81
00:06:35,000 --> 00:06:40,000
 So switching over to sort of 4-bit turbo quant, so again, you got the beautiful Earth,

82
00:06:40,000 --> 00:06:44,000
 so it hasn't lost that fidelity, and we can enter the flight,

83
00:06:44,000 --> 00:06:48,000
 and to look at this, this is just a beautiful spaceship.

84
00:06:48,000 --> 00:06:51,000
 I've got to say, it's a very, very nice-looking spaceship.

85
00:06:51,000 --> 00:06:54,000
 Like, you can see the shadows, the lighting on it,

86
00:06:54,000 --> 00:06:57,000
 it looks very beautiful, the spaceships are.

87
00:06:57,000 --> 00:07:00,000
 I've got to say this looks really, really good 4-bit turbo quant,

88
00:07:00,000 --> 00:07:04,000
 3-bit turbo quant, you've now lost context on the actual Earth.

89
00:07:04,000 --> 00:07:08,000
 So previously when we did 3.5, we started to get in runtime errors.

90
00:07:08,000 --> 00:07:12,000
 Here, we actually did get compile errors, so it couldn't compile the metal shader,

91
00:07:12,000 --> 00:07:14,000
 and we lost context on the Earth.

92
00:07:14,000 --> 00:07:16,000
 So there is a technique that also threw in there,

93
00:07:16,000 --> 00:07:20,000
 and it's also written in a paper, and it's mixed precision quantization,

94
00:07:20,000 --> 00:07:25,000
 and what they do is, depending on the position required for the layer,

95
00:07:25,000 --> 00:07:28,000
 they increase the quantization or decrease quantization.

96
00:07:28,000 --> 00:07:31,000
 So I just did a very, very basic one, where I think the first two layers

97
00:07:31,000 --> 00:07:33,000
 and the last two layers, they're the most important,

98
00:07:33,000 --> 00:07:35,000
 especially when you go into quantization, all that stuff.

99
00:07:35,000 --> 00:07:40,000
 I kept that always at 4-bit fidelity, and when I do that with just the standards 3.5 bits,

100
00:07:40,000 --> 00:07:43,000
 the affine quantization, again, we still don't have the Earth,

101
00:07:43,000 --> 00:07:45,000
 and it's still getting a runtime error.

102
00:07:45,000 --> 00:07:50,000
 However, with 3-bit turbo quantings, this is just the one-part solution at the moment.

103
00:07:50,000 --> 00:07:53,000
 We now have the Earth, and we now got a spaceship,

104
00:07:53,000 --> 00:07:56,000
 and look, we can fly around in a spaceship.

105
00:07:56,000 --> 00:08:00,000
 So we've definitely, unless we got lucky, well, you can see the results for itself.

106
00:08:00,000 --> 00:08:03,000
 We definitely did get some improvement, okay?

107
00:08:03,000 --> 00:08:07,000
 And when it comes to 2-bit precision, we completely fail, and it's completely awful.

108
00:08:07,000 --> 00:08:10,000
 So that is the situation as it is.

109
00:08:10,000 --> 00:08:14,000
 Now, I'm going to show you it running in case you're a bit unsure of what is the situation.

110
00:08:14,000 --> 00:08:17,000
 Our 4-bit turbo, I'm just going to run that.

111
00:08:17,000 --> 00:08:19,000
 So you can see here that it's producing tokens.

112
00:08:19,000 --> 00:08:21,000
 It's got around 13 tokens a second.

113
00:08:21,000 --> 00:08:24,000
 Now, this is a big context window around 8,000 tokens,

114
00:08:24,000 --> 00:08:29,000
 and this version can be sped up, so right now we're de-quantizing it every single layer

115
00:08:29,000 --> 00:08:32,000
 to be passed into the normal dot product.

116
00:08:32,000 --> 00:08:34,000
 Now, we can fuse that into a metal shader and make it run faster,

117
00:08:34,000 --> 00:08:36,000
 but that is the situation as it is today.

118
00:08:36,000 --> 00:08:40,000
 And if we go into the token inspector, so let's just get a bit of entropy here.

119
00:08:40,000 --> 00:08:45,000
 We can see that constructed from the original version here, the fully unquantized version.

120
00:08:45,000 --> 00:08:48,000
 We can see that here is the updated code with controllable spaceship.

121
00:08:48,000 --> 00:08:50,000
 Here is the updated code with controllable spaceship.

122
00:08:50,000 --> 00:08:54,000
 I have added a spaceship model constructed from geometries.

123
00:08:54,000 --> 00:08:58,000
 So instead of it going for geometries, that was the top token you can see here.

124
00:08:58,000 --> 00:09:01,000
 Originally with the unquantized version, we're 41% going to there.

125
00:09:01,000 --> 00:09:05,000
 We see that when we did the 4-bit quantization, geometries now isn't the top token anymore.

126
00:09:05,000 --> 00:09:10,000
 It's now the second top token at 32%, whereas previously it's 41% and geometric.

127
00:09:10,000 --> 00:09:14,000
 That one is now 41, so they've kind of like swapped over there.

128
00:09:14,000 --> 00:09:19,000
 Just due to that precision loss situation, so you can see that these kind of errors.

129
00:09:19,000 --> 00:09:23,000
 And it also depends on the seed as well, although I am running this greedy

130
00:09:23,000 --> 00:09:28,000
 with temperature set to zero to try making it as precise and reproducible as possible.

131
00:09:28,000 --> 00:09:33,000
 But you can see that it starts to drift, so you can see the controls is one of the top lines here,

132
00:09:33,000 --> 00:09:36,000
 whereas HTML says DocType is at the top there there.

133
00:09:36,000 --> 00:09:41,000
 So it starts writing the code when it has that full precision, but at turbo 4,

134
00:09:41,000 --> 00:09:43,000
 that's when you start getting differences.

135
00:09:43,000 --> 00:09:47,000
 So to find exactly what's going on, it's tracking the precision loss in between the quantized

136
00:09:47,000 --> 00:09:50,000
 the quantized version versus the full precision one.

137
00:09:50,000 --> 00:09:53,000
 So I have a custom version where I'm monitoring the situation.

138
00:09:53,000 --> 00:09:56,000
 And with 16-bit, there is no loss whatsoever.

139
00:09:56,000 --> 00:09:58,000
 So it's 100% and zero, zero.

140
00:09:58,000 --> 00:10:02,000
 With nine bits, so this is the highest 8-bit you can get with group sizes of 32.

141
00:10:02,000 --> 00:10:08,000
 Again, we have the exact same cosine similarity, and the mean absolute error is 0.09.

142
00:10:08,000 --> 00:10:10,000
 So that's very, very, very low.

143
00:10:10,000 --> 00:10:15,000
 When we go down to 4.5-bit, the mean absolute error is 0.15.

144
00:10:15,000 --> 00:10:18,000
 So that's pretty high compared to 0.09.

145
00:10:18,000 --> 00:10:20,000
 And we see the results very, very similar.

146
00:10:20,000 --> 00:10:26,000
 Now, when it comes to 4-bit turbo, this is just using the first pass, the MSE pass,

147
00:10:26,000 --> 00:10:31,000
 we actually get a lower mean absolute error than the 4.5-bit.

148
00:10:31,000 --> 00:10:35,000
 But it's still higher than 5.5-bit.

149
00:10:35,000 --> 00:10:39,000
 So it says it's like a near lossless quantization.

150
00:10:39,000 --> 00:10:46,000
 If you're looking at the actual fidelity, the bits, the actual values inside the cache,

151
00:10:46,000 --> 00:10:52,000
 you can see that it's not as lossless as you can imagine, like 9-bit is 0.09.

152
00:10:52,000 --> 00:10:55,000
 Here, 4-bit turbo is 0.12.

153
00:10:55,000 --> 00:10:58,000
 And with 3-bit, that's 0.23.

154
00:10:58,000 --> 00:11:02,000
 So that's insanely high, and that 2-bit is just 0.5.

155
00:11:02,000 --> 00:11:05,000
 That's just a massive, massive variance of the situation.

156
00:11:05,000 --> 00:11:09,000
 But again, this measures the actual fidelity, what is in the actual bits,

157
00:11:09,000 --> 00:11:11,000
 and the deviation between the bits.

158
00:11:11,000 --> 00:11:12,000
 So we did one better.

159
00:11:12,000 --> 00:11:14,000
 We looked at the actual perplexity of the end results.

160
00:11:14,000 --> 00:11:18,000
 You know, so when I was showing you over here that the top token changed,

161
00:11:18,000 --> 00:11:22,000
 so the top token is now geometric instead of it being geometries.

162
00:11:22,000 --> 00:11:24,000
 So that's the measurement of perplexity.

163
00:11:24,000 --> 00:11:27,000
 That's what measures the confidence levels of the top token.

164
00:11:27,000 --> 00:11:30,000
 And these are the results we got with 500 tokens.

165
00:11:30,000 --> 00:11:35,000
 So with the full one, you get a perplexity of 0.07 and top token accuracy

166
00:11:35,000 --> 00:11:39,000
 produced the exact same tokens as it would because, you know, we got the greedy,

167
00:11:39,000 --> 00:11:43,000
 we got the temperature set to 0 on the seed, set to 0, so it's all perfect.

168
00:11:43,000 --> 00:11:49,000
 With 9-bit quantization on the context, again, in top token accuracy of 99.6%,

169
00:11:49,000 --> 00:11:52,000
 a very, very high, we only missed two tokens.

170
00:11:52,000 --> 00:11:53,000
 So we only missed two tokens.

171
00:11:53,000 --> 00:11:55,000
 The divergence was very, very low.

172
00:11:55,000 --> 00:11:58,000
 With traditional four-bit quantization, affine quantization.

173
00:11:58,000 --> 00:12:01,000
 The flex stays now 1.117, so it's going to hire.

174
00:12:01,000 --> 00:12:04,000
 And we've dropped to 97.2 top token accuracy.

175
00:12:04,000 --> 00:12:10,000
 So only was it 14 of the tokens that the 500 tokens were the wrong ones picked.

176
00:12:10,000 --> 00:12:13,000
 But when the wrong ones were picked, they were divided by 30%.

177
00:12:13,000 --> 00:12:19,000
 Now, when it comes to four-bit turbo, with the one pass with MSC,

178
00:12:19,000 --> 00:12:26,000
 I was getting a perplexity pretty much slightly lower than the four-bit affine quantization.

179
00:12:26,000 --> 00:12:27,000
 So slightly lower.

180
00:12:27,000 --> 00:12:31,000
 So this is 1.1178, and this is 1.1171.

181
00:12:31,000 --> 00:12:36,000
 Now, the model I'm running here is llama1b because it runs really fast and gets some results,

182
00:12:36,000 --> 00:12:38,000
 and I have to wait forever.

183
00:12:38,000 --> 00:12:43,000
 Maybe on a different model, it might run better, but I'm just showing you what I've got.

184
00:12:43,000 --> 00:12:47,000
 Now, I also modified it to actually include the second pass.

185
00:12:47,000 --> 00:12:49,000
 The quantized Johnson Linda Strass.

186
00:12:49,000 --> 00:12:50,000
 I love that name.

187
00:12:50,000 --> 00:12:51,000
 It's very, very good.

188
00:12:51,000 --> 00:12:56,000
 Now, the problem with that one is, rather than giving the four-bit to the MSC to get highest

189
00:12:56,000 --> 00:13:01,000
 precision, you actually give one of the bits, you sacrifice it over for a second quantization

190
00:13:01,000 --> 00:13:06,000
 at one bit, and that's how they have error corrected and got their amazing results.

191
00:13:06,000 --> 00:13:12,000
 But when I did that, the perplexity shot up to 1.5, and the top token accuracy went to 88.2.

192
00:13:12,000 --> 00:13:13,000
 So it's worse.

193
00:13:13,000 --> 00:13:15,000
 It was worse for me.

194
00:13:15,000 --> 00:13:18,000
 If anyone's interested, I can share the code that I'm running.

195
00:13:18,000 --> 00:13:21,000
 But again, I've already showed you the root code if you want to check it out.

196
00:13:21,000 --> 00:13:25,000
 So there's three versions you can look at, already open source.

197
00:13:25,000 --> 00:13:30,000
 And yeah, so we'll see what happens when it comes to three bits.

198
00:13:30,000 --> 00:13:35,000
 And this version is actually using the MSC position that we talked about with the first two

199
00:13:35,000 --> 00:13:40,000
 and the last two are at four bits because that's the only one that actually yielded some results when we ran it.

200
00:13:40,000 --> 00:13:45,000
 So this one, you know, 95% token accuracy, so that's good with the one pass.

201
00:13:45,000 --> 00:13:50,000
 And that is higher than the two pass, but it's still lower than the one pass four-bit,

202
00:13:50,000 --> 00:13:53,000
 and it's still lower than the traditional affine quantization of four bits.

203
00:13:53,000 --> 00:13:58,000
 And with two passes, the perplexity just shoots up to 15.38.

204
00:13:58,000 --> 00:14:04,000
 So I'm hoping that, like, I'm currently at the bottom of the hill here.

205
00:14:04,000 --> 00:14:09,000
 I've been battling this for two days, just doing all sorts of experiments, just trying to get this nailed.

206
00:14:09,000 --> 00:14:15,000
 And that's where I am. Let me know what you guys are finding with this quantization technique,

207
00:14:15,000 --> 00:14:19,000
 because, yeah, I don't know how they're getting their results.

208
00:14:19,000 --> 00:14:27,000
 And their results, just look at here, the full precision scores at 99.7%, and they score at 99.7%.

209
00:14:27,000 --> 00:14:34,000
 And they say all of the other ones are rubbish, and their one is the best. It is lossless, the same score.

210
00:14:34,000 --> 00:14:41,000
 They're also running Lama 3.18b. That's in the needle haystack test, so their results are spectacular.

211
00:14:41,000 --> 00:14:48,000
 But with that being said, just remember that this generation that we made here with the four-bit turbo coolant,

212
00:14:48,000 --> 00:14:54,000
 let's just get it there. Look, the asteroid still worked, so it is exciting times over there,

213
00:14:54,000 --> 00:14:57,000
 and it's going to be really fun to see what other quantization techniques happen,

214
00:14:57,000 --> 00:15:00,000
 and a little bit more information about maybe the model that they use.

215
00:15:00,000 --> 00:15:05,000
 This is Minimax and Lama 1, that's the ones I tried out with. Let me know what models you guys have tried at home.

216
00:15:05,000 --> 00:15:09,000
 And if you guys remember, if you want to play along, check out this is MLX LM.

217
00:15:09,000 --> 00:15:15,000
 You've got two versions over here, hyphenic one, very, very easy to follow, and a turbo quant, that one over here.

218
00:15:15,000 --> 00:15:19,000
 But using one memory, you can still follow along, so it's very, very good. That is the first pass.

219
00:15:19,000 --> 00:15:29,000
 And there's also, if you go to MLX VLM, that's made by this genius kit called Prince Blazi.

220
00:15:29,000 --> 00:15:32,000
 He's awesome, and he's also got a version for it, you can check it out.

221
00:15:32,000 --> 00:15:35,000
 I tried it out, it didn't really work too well, so let me know.

222
00:15:35,000 --> 00:15:41,000
 Is turbo quant, going to be the hero of quantization? What I do, actually love about this, I'm going to say it.

223
00:15:41,000 --> 00:15:46,000
 Looking at their paper, I'm looking forward to trying out the other methods as well to find out exactly what's going on.

224
00:15:46,000 --> 00:15:50,000
 So that's really cool. Let me know what you guys, so let me know what you guys think.

225
00:15:50,000 --> 00:15:53,000
 Hope you guys found this video useful and enjoyed the show.

226
00:15:59,000 --> 00:16:01,500
 (baby crying)