Комментарии:
Thanks so much for the video! 2 questions: 1. Instead of having a global counter, can we just partition the permissible numbers into the different servers? 2. If we did not scale the write, how do we ensure high availability for the counter (if it is inside the server)?
ОтветитьFor the custom alias, we can put the limit on the number of characters to be equal to how much we are allowing for normal(5-7 characters), but to make it work we have to put a check that it exists or not on a particular shard where it can exist, which should be fine as now the search space will be smaller, search can be improved further by keeping the short urls in a particular shard sorted.
Insert, search all will have complexity in log(n)
For the database sharding we can use hash based sharding on short url (1...256)
That way it will be easier to scale and as short url as randomly generated so will be much even distribution
I literally had a guy who was arguing how the API should be named and how plurals should never be used - "You work with one resource at a time, not many, therefore use singular", I think he argued about that for a good 5 minutes.
If the person reading this identifies himself as this type, know that you are obnoxious and you do not know how to conduct interviews and you suck. (great video btw <3)
Love the idea doing the math only when you need it to make a decision. I was so confused watching other mock interviews doing math just for math but never mentioned the result afterwards. I never understood why they need those figures
ОтветитьFor the short code generation problem, I think we can also pre-generate a bunch of unique short URLs before hand and store them somewhere (Redis, etc.) and just take one from that pool whenever we need.
ОтветитьThe design you ended up with at the end is what I had in mind from the beginning. This means, I don't have to get to the RPS calculations, right? Bottom line is, we know the system should scale no matter the number of users/reads/writes. The calculations are obviously important if we want to know expected costs of scaling, which in the context of the interview is out of scope. I stand to be corrected.
ОтветитьIt was the most clean and easy to understand tutorial of System Design. Thanks.
ОтветитьThe deep dives part is the best part for indepth knowledge
ОтветитьI've been prepping for a technical interview for over a year now, all pure Leetcode and no system design. I also just landed my first interview with Meta upcoming soon for an E4 position. Your channel/Hello Interview are absolutely phenomenal for learning system design. It is so exceptionally clear, but nonetheless still very difficult for someone coming from nothing.
ОтветитьI hate this design problem. It's so simple but every single time I have encountered it, the interviewer wants a different answer on how to generate the short URLs. Snowflake ID is the obvious correct choice, but I have seen interviewers waste my time to discuss alternative options - maybe pick a better design problem that has more interesting discussion
Ответитьawesome !!
ОтветитьAbsolutely fantastic. From now on this will always be my #1 recommendation for a first video on system design if someone is just learning or has started watching videos/reading articles and is feeling lost. As a frontend engineer a lot of this stuff has been handled by other people in most of my jobs and at times has felt overwhelming and too intimidating.
I believe that anyone can learn anything with the right teacher. But finding the right teacher for any given thing can take years or decades. Your pacing, not using too much jargon when it's unnecessary, and clear structure that anyone can learn and follow are fantastic and I wasn't at all surprised to jump over to the comments and see how many people feel the same.
Thank you for your commitment to making so much of your base content free!
I have a question about choosing availability over read-after-write consistency as the non-functional requirement: The user that generated the url will most likely click it to see if it’s working okay immediately. Doesn’t that mean we want strong read-after-write consistency? Additionally, if 2 users generate the same url at the same time unfortunately due to a collision, that would be a problem. So I was thinking that we actually would need some degree of consistency which is greater than just eventual consistency.
ОтветитьHow are we able to support custom aliases when we are using the linear caching for the uniqueness?
ОтветитьWhat about snowflake ID's and base62 encoding them?
ОтветитьA couple points:
1. While we store the expiration date in the database, we do not store it in Redis. So even expired URLs would still be available in cache.
2. 10^5 seconds in a day doesn't sound right.
3. There wasn't any mention of how we would ensure that custom alias would not clash with the short URL and vice versa.
but Bijective functions can also lead to predictability right? As each number will be mapped to exactly one number. If hacker somehow know which bijective function system is using, it can again lead to same problem?
Ответить"I passed this interview. I crushed it and I know I've crushed it..."
Gets ghosted by company. Thanks 2022+...
All jokes aside, thanks for making these. They're extremely helpful.
I think we can use Snowflake ids with base 62 encryption also for short urls
ОтветитьEvan you are the best. Like seriously! Loved this explanation !
Almost got me feeling in an interview and also saying "Who cares?" so confidently while defending my system.
Thank you for these.
Looking forward to how my mock is gonna go ;)
Can you please move this video to top the the list ?
It will great help for beginners like me.
Hi Evan. Humble request to please make a video on eCommerce system like Amazon and a service like doordash
ОтветитьThanks, when I first strated preparing for the system desing interviews, I had no idea if I could even try to get started. I love how you made things systematic in such a simple but effective framework. I had a short time to prepare my system design interviews, got one passed, one failed. I don't have any sd interviews lined up, but i don't want to lose what I studied, so i'll keep grinding from here. Really love your contents and actually am enjoying learning these things! Thanks for the great contents.
ОтветитьYou still get logs from your CDN services, so you don't lose the analytics of which shortened urls are being requested. The main downside to using a CDN is the cost, although properly implemented, CDNs save you money on egress and content delivery. Obviously there is also the cost of complexity, but for high scale, globally distributed services, CDNs are a must-have.
The 301 vs 302 redirect only applies to repeated lookups of a shortened url by a single client host device. All end-users (host devices) will still need to hit your service to look up the long url the first time they want to use it. So when a shortened url is shared, either way you'll still get a count of users (really, devices) that hit that url.
Great guidance on system design. One nit to pick... I believe that "ensure uniqueness of short codes" is a functional requirement. Non-functional requirements are not specific to the application (qualities like security, performance, scalability, reliability, resiliency, legal compliance, etc.).
ОтветитьExcellent explanation. Best part of your system design interview videos are that you teach step by step and along with this you also pin point the common mistakes. That's really helpful.
ОтветитьHow would we shard the db if we need to check the uniqueness of longUrl (if there are few threads saving the same longUrl )?
ОтветитьThanks for the valuable content!
I think there's an error in the calculation of the expected number of collisions. With 1B insertions into 56B buckets, there would be 17 Million expected, or 1 collision every 50 insertions (amortized), while the video states 880k (1 in 1300).
A collision every 50 inserts is definitely something to worry about
Early in the video you said this design would be AP in the CAP theorem, but you used a writw through cache and a SQL database. I would've labeled your design as CP. Am I wrong? (You guys are the best, thanks so much)
ОтветитьI do have a question, but I want to lead with saying this video (like all of the other content on this channel) is incredible!
I was surprised you went with the counter solution. I felt like the idea of writing the count to disk periodically would help with the single point of failure, but it'd still be a heck of a thing to account for. Also, I felt like it wasn't saving enough time compared to querying the database to make up for the complication it adds.
Am I missing something? Would the querying solution cause issues with race conditions or something else?
Thanks so much!!!
The video gives a "plan" how to go through the design interviews and teaches the main idea -- go top to bottom, expand on the requirements, and avoid a trap of getting stuck on low-level details too early.
However, this very approach results in that I would call a problem. Implementation of the "counter" idea is very questionable in the end. The original premise was that we needed a counter to avoid 1 extra read to the database to check uniqueness. But, in the end, the author creates a "global counter". Instead of an extra read to a database, we get an extra read to the cache. Ok, we reserve 1000 counts to avoid reading -- that solves the problem. But we can equally keep the counter in the database, not in Redis cache.
What happens if a Redis instance dies? Yes, Redis can sync it's state to the disk in the background. But if Redis dies before syncing the last increment, then the "primary server" would issue 1000 short urls on a stale counter. And, on Redis restart, the primary server "reserves" the same 1000 urls. So we need a duplicate check to handle this -- back to square one.
which font are you using?
ОтветитьWe have two Service Redirect Service and URL Shortening Service.
Can I say we need Availability over Consistency for Redirect Service but we need Consistency over Availability for URL Shortening Service so that URL's don't collide or Collision can never happen as our counter will always be unique so we don't need to worry about it?
I think we can add 5th approach for unique id generation to be like twitter snowflake approach mentioned in Alex Xu's first System design book. We are then guaranteed to generate a unique id for each request and then base 62 it to get shorter URL. WDYT ?
ОтветитьWhat whiteboard software is he using?
ОтветитьThere is a fixed operational cost you pay for each service. So unless the write service is doing a lot of work, it will be very expensive to run that service let alone maintain it. So that's another valid argument in favor of just keeping a single service that would handle both requests.
ОтветитьThe global counter could be avoided somewhat elegantly by simply using modulus. So, if you have two write services then one of them will assign even numbers, and another odd. For three services, one will assign numbers with remainder 0 when divided by 3, another remainder 1, the last remainder 2 etc.
Then, only a small consensus needs to be made when a new write service boots up or shuts down in order to assign the congruence classes to the write services, rather than requesting blocks of IDs etc.
I really like these system design videos. For this one, there was no functional nor non-functional requirement that required the user entity. I would have liked a discussion on a non-functional requirement about abuse prevention (someone creating billions of urls, how we should rate throttle a guest user).
Ответитьyour framework helped me get my first job as a SWE at a top SV startup. brilliant content, I owe you a coffee ☕️
ОтветитьHey, Evan! Thanks for such an informative video. I had just one question: for generating the short code, why can we not follow a hybrid approach where we combine the counter with a hash of the URL (a prefix of it actually)? We can introduce more symbols (so go from say base62 to base100), and that will allow us to represent a higher space with fewer digits. As in, we can represent the same counter with fewer symbols, and we can use the rest to prepend or append the hash of the URL. This way, we get the best of the both worlds.
ОтветитьHello from Sammamish/Issaquah. These videos are great.
I am also alumni from Cornell (majored ECE).
Thank you again.
Great video. However my problem with these interviews is, how the fuck (or why) would i know the maths behind the number generations for 1B iterations of a 7 digit code taking ijnto account the collision rate.
I guess its a problem of having so much competitions for a position, somebody will know that stuff. Its absolutely 0% required to do the job, once youre actually doing the job you can take thje time to either research this, or involve somebody who has done so.
Thats domain knowledge. It would only be relevant if its the same product that the company you are trying to get into, is selling.
Great video. Isn't the cache "Cache-aside" rather than "Read-through" as application is responsible for writing into the cache in case of cache miss?
ОтветитьWhat happens if the global counter redis server goes down?
Ответить