A 5 years+ tech lead said they shard a database to scale but then he failed to answer this question

8 min readApr 2, 2023

Never talk about something you do not fully understand during an interview

Another interview story.

Q: “Introduce yourself and the recent project you are working on, as detail as possible”.

A: “Sure, I’m …, working at an internet company, a public-facing platform that serves millions of requests per minute..blabla.. my team is a search module, and we provide search services to other teams.”

Q: “Ok, what technology is used in your team, and what is your scope, any challenges you have been facing, and how have you solved them”.

A: “blabla… by the way we scaled our database (MySQL) by sharding. it is fast and scalable”.

Q: “Interesting, why go for sharding?”

A: “It is a big table. data keep increasing in a single table, search becomes slow as it grows, we need them to be split into different machines. so the search is faster and scales horizontally”

Q: “Okay. how do you shard it, range-based or key based? If key, what is the sharding key?”

A: “We use hash, sharding key is xx.”

Q: “Could you tell me how to query the recent 500th to 1000th updated records and order by modified date? ”

A: “….” # after 3 minutes

Q: “You have to query all databases and order by modified (must be indexed well) and limit 500 offsets 500 then merge in memory right?” I tried to continue the conversation.

A: “No we have our own algorithm to do it more efficiently…”

He drew something on the whiteboard Id [0,5000]-machineA, Id[5001, 10000]-machineB, … then I reminded him “you told me your sharding key is hash right 😄?”, he stopped and spent a few minutes thinking, didn’t come up any more answer.

Then we discussed something else, and the interview finished in another 3 mins.

To this topic, there are much more difficult questions ahead in my mind:

  • When you add a new machine, do you need re-shard? and how did you handle it?
  • How do you handle schema changes?
  • How do you join tables? Does your code have no sub-query? and what if you have to use it? Is there an alternative approach?




A channel which focusing on developer growth and self improvement