Комментарии:
brilliant
Ответитьso nice
ОтветитьYou’re the GOAT
ОтветитьQ-Star Lizard Gang 2024
Ответитьnice explanation deeplizard
ОтветитьI guess crickets make sound, so lizard can take that as input as well to take the path
ОтветитьVery good content, I watched the videos in this playlist to prepera for my exam. Thank you 😊
ОтветитьAre you here after Reuter's article on OpenAI's q*?
ОтветитьHey, I loved your video. Thank you so much
ОтветитьYour work is simply incredible. Thank you!
Ответитьshe has the most annoying voice ever jesus christ
Ответитьmore B roll video crap without code or mathematics, pandering advanced topics to idiots. this is a waste of human hours.
ОтветитьI'll tell you what I really don't get, it seems the equation only updates the q table based on the current and next state. But the Bellman equation seems to imply that all future states are considered, is there some recursion thing going on?
ОтветитьReally nice video, thanks for the clear explanations!
ОтветитьThanks so muuuuch !
ОтветитьO' the puns, ... exploit vs explore
ОтветитьAmazing video!
ОтветитьThank you!
ОтветитьWon't a square become empty when the cricket(s) is(are) eaten?
ОтветитьLink to the talk that appeared at the end of the video?
ОтветитьWhy you are so so clear in explaining, I mean why others fail to deliver the tutorials with such clarity like you do?? I dont know whats wrong with everyone. Omg you are impressive.
ОтветитьI am too stupid to understand the video.. My bad..
Ответитьyour videos are awsome! Please correct the corresponding quiz since the answer is uncorrect to me. Could you do a video explaining the first three steps and how the q-tables updates. This would really help to understand how the update works. thank you!
Ответитьu r an AI, so nicely explained all these hard concepts so easily. thank u so much
ОтветитьHow are you so good at explaining😍😍😍😍
ОтветитьHey @deeplizard,
Many thanks for this video. I'm reading 'Reinforcement Learning: An Introduction, Second Edition by Richard S. Sutton and Andrew G. Bartow' and I'd like to know whether Q- learning technique described here is the same as dynamic programming explained in the book?
Is the exploration vs exploitation part only a part of the training or does it happen also when actually using the learned q table and also can the policy be that "Take the action which has the largest q value and sometimes explore" (eq can that be an example of a policy in this case)? So the policy is just the probability of taking some action in a state so can the policy just be written as: "Take the action which has the largest q value" (this is just an example for exploitation)?
ОтветитьGood balance of exploration and exploitation will bring good results in life too! We are all lizards! :)
ОтветитьCan't wait for coming episodes, because this series is amazing! And they/you helped me a lot. Thank you so much! <3
ОтветитьPHENOMENAL! Your videos are THE BEST! Can you PLEASE PLEASE PLEASE do a series on Actor-Critic methods!!
ОтветитьThis was an exciting video, finally, we are getting to the good stuff.
ОтветитьAwesome explanation. I like this
Ответитьplease remove the sound played with the logo at the start of video. the sound is very bad especially when one listens it on head phones.
ОтветитьReally good work
ОтветитьBetter than my engineering teacher.
ОтветитьThese series are so so informative !! I wish you could make videos on dynamic navigation techniques using DRL
Ответитьmerci merci
hats off
AMAZING SERIES! Absolutely loved it!
ОтветитьYour voice is just amazing 😍😍😍😍😍
Ответитьchannel name is creepy but explanation is amazing...
Ответитьyou saved my life bro
ОтветитьThanks for the good explanation and all your work. A little hint if I may: Don't explain the words using the same words: Exploitation and Exploration
ОтветитьYou say that Q-learning tires to find the best policy. However, I thought q-learning is an off-policy algorithm. I also have troubles understanding the on/off policy concept.
ОтветитьLuv u babe.
Ответитьexploration of reinforcement learning is going fine !
ОтветитьWhat should I do in case of continuous tasks? Like in Flappy Bird (if its continuos, but anyway), I guess Q Table would be infinite here or just would have big fixed value to save memory. Can you give some recommendations or explain please? Wanna start implementation, but dont know how Q Table should look like in this case and how to interact with it correctly ( and i hope there will be no other surprises lol )
Ответить