I learned this data structure from a data scientist during an interview process
K-D tree
The interview
These few weeks I went through a couple of interviews, and surprisingly many are data scientists who wanted to join as a Software Engineer (and I don’t know why) role.
So one of the candidates was a data scientist who is very strong in algorithms. we went through 2 leet code Medium (DFS, UF, and graph) then talked about the search improvement he did, from database full-text search (we both agree should NOT use it :), pls drop your comment if disagree), inverted index in ES, and also vector database.
Here is something new to me.
“vector search, I haven’t used it before, could you help me understand how it works and how it helped in your product? thank you.” I asked.
“Yes sure, it is basically to find nearest neighbors, input is a vector, and return a list of similar datasets (closest vectors)”. he explained clearly.
“Ok, thanks. As I understand from the brute force KNN algorithm should be very slow when the dataset goes huge, it computes every 2 points distance and sorting, even if you put min-heap it should not help much it still O(N²)” I said.
“Yes, so we improved it, first using k-d tree and then ANN”. he answered.
“Well, this is something I don’t know, is it a data structure used very often in Machine Learning?”