Member-only story
A 5 years+ tech lead said they shard a database to scale but then he failed to answer this question
Never talk about something you do not fully understand during an interview
Another interview story.
Q: “Introduce yourself and the recent project you are working on, as detail as possible”.
A: “Sure, I’m …, working at an internet company, a public-facing platform that serves millions of requests per minute..blabla.. my team is a search module, and we provide search services to other teams.”
Q: “Ok, what technology is used in your team, and what is your scope, any challenges you have been facing, and how have you solved them”.
A: “blabla… by the way we scaled our database (MySQL) by sharding. it is fast and scalable”.
Q: “Interesting, why go for sharding?”
A: “It is a big table. data keep increasing in a single table, search becomes slow as it grows, we need them to be split into different machines. so the search is faster and scales horizontally”
Q: “Okay. how do you shard it, range-based or key based? If key, what is the sharding key?”
A: “We use hash, sharding key is xx.”
Q: “Could you tell me how to query the recent 500th to 1000th updated records and order by modified date? ”
A: “….” # after 3 minutes
Q: “You have to query all databases and order by modified (must be indexed well) and limit 500 offsets 500 then merge in memory right?” I tried to continue the conversation.
A: “No we have our own algorithm to do it more efficiently…”
He drew something on the whiteboard Id [0,5000]-machineA, Id[5001, 10000]-machineB, … then I reminded him “you told me your sharding key is hash right 😄?”, he stopped and spent a few minutes thinking, didn’t come up any more answer.
Then we discussed something else, and the interview finished in another 3 mins.
To this topic, there are much more difficult questions ahead in my mind:
- When you add a new machine, do you need re-shard? and how did you handle it?
- How do you handle schema changes?
- How do you join tables? Does your code have no sub-query? and what if you have to use it? Is there an alternative approach?