The hard questions in DS/ML interviews I've received over the years aren't the t...

jstx1 · on Jan 10, 2022

I think they're fine as long as you know the format and have an opportunity to prepare or just get in the right mindset for it. And some things (like binary search) should be easy to write anyway.

The SQL questions can also be a symptom of the type of job - Facebook's first data science round focuses a lot on SQL but that's because it's a very product/analytics/decision-making focused role without that much coding or ML. With data science you have to be more careful about these things when searching for a job; you can't just use the job title as a descriptor.

minimaxir · on Jan 10, 2022

> And some things (like binary search) should be easy to write anyway.

It's a different story when a) your mind is set on statistics/linear algebra b) you've never had to actually implement binary search by hand since college and c) even if you do implement the algorithm and demonstrate that you have a general understanding, it must work perfectly and pass test cases otherwise it doesn't count.

FWIW I was rarely asked about algorithmic complexity which is more relevant in DS/ML, albeit it's usually in the context of whiteboarding another algorithm and the interviewer mocking me for doing it in O(n) instead of O(logn).

kragen · on Jan 10, 2022

Binary search in particular is surprisingly tricky, which is precisely what makes it useful for telling if someone knows how to program. To a significant extent, though, you can cheat by studying binary search itself, which is a surprisingly beautiful thing.

I like this formulation for finding the first index in a half-open range where p is true, assuming p stays true thereafter:

    bsearch p i j :=
     i                   if i == j else
     bsearch p i       m if p m    else
     bsearch p (m + 1) j
     where m := i + (j - i)//2

Or in Python:

    def bsearch(p, i, j):
        m = i + (j - i) // 2
        return (i if i == j
                else bsearch(p, i, m) if p(m)
                else bsearch(p, m+1, j))

The only tricky thing about this formulation is that m < j if i < j, thus the asymmetric +1 in only one case to ensure progress. If invoked with a p such as a[m] >= k it gives the usual binary search on an array without early termination. The i + (j - i) // 2 formulation is not needed in modern Python, but historically an overflowing (i + j) // 2 was a bug in lots of binary search library functions, notably in Java and C.

(Correction: I said a[m] <= k. This formulation is less tricky than the usual ones, but it's still tricky!)

minimaxir · on Jan 10, 2022

> Binary search in particular is surprisingly tricky, which is precisely what makes it useful for telling if someone knows how to program.

That's the problem. There are many other ways to do that without risking false negatives and annoying potential candidates (e.g. I would not reapply to places that have rejected me due to skepticism about my programming abilities and using tests blatantly irrelevant to day-to-day work because it's a bad indication of the engineering culture).

Even FizzBuzz is better at accomplishing that task.

nerdponx · on Jan 10, 2022

FizzBuzz (or equivalent) is actually great IMO. It weeds out the people who lied on their resume, without punishing the people who never learned CS because they were too busy learning things that were actually useful to DS, like statistics or data visualization.

jstx1 · on Jan 10, 2022

I've actually been given fizzbuzz in a DS interview! Up to that point I thought that fizzbuzz was just a meme because it's obviously too easy.

rightbyte · on Jan 10, 2022

I tried to make Fizzbuzz on a paper when I first heard of it, and it had a bug printing fizzbuzzfizzbuzz on 15.

If you want a correct program without a compiler/computer I don't think anything is too easy. Maybe like, "make a function returning the sum of two float parameters".

kragen · on Jan 10, 2022

That would just test syntax, though. Fizzbuzz tests logic. Your bug was a logic bug.

To a certain extent you can dispense with mental logic by using a compiler. But the feedback loop is much slower. Thinking your logic through before feeding it to a compiler is like looking at a map when you're driving a car; you can cut off whole branches of exploration.

Binary search is a particularly tricky logic problem in part because it's so deceptively simple. In a continuous domain it's easy to get right, but the discrete domain introduces three or four boundary cases you can easily get wrong.

But the great-great-grandparent is surely correct that many programming jobs don't require that level of thinking about program logic. Many that do, it's because the codebase is shitty, not because they're working in an inherently mentally challenging domain.

rightbyte · on Jan 10, 2022

Ye I meant running it and then correcting the error.

Concerning binary search I acctually implemented that in an ECU for message sorting. It took like a whole day, including getting the outer boundries one off too big in the first test run. Funnely enough the vehicle ran fine anyway.

I would never pull that algorithm off correctly in an interview without training to, I think.

kragen · on Jan 10, 2022

Take a look at the downvoted-to-0 formulation I gave upthread, then see if you can program it that way tomorrow without looking, and then think it through to see if it could possibly be wrong, and once you're satisfied it's correct, try testing it. Probably you'll never need to implement binary search yourself again, but it's a good exercise for thinking through algorithms. You can probably get it working that way in under an hour instead of a whole day.

kragen · on Jan 10, 2022

There are levels of not knowing how to program that go beyond FizzBuzz. But sure, many programming jobs don't require them.

minimaxir · on Jan 10, 2022

If that's the case for the DS/ML domain, then a short take-home exam should provide a better example of practical coding ability (the common counterargument that "take-home exams can be gamed" is a strawman that would be more on the interviewer's fault for creating a flawed exam).

In my case, I typically got the "implement binary search" questions in a technical interview after I passed a take-home exam, which just makes me extra annoyed.

kragen · on Jan 10, 2022

Agreed.

If you're gaming the take-home exam by looking up the answer on Stack Overflow, you could game the same exam in person by reading books of interview questions ahead of time, and the interviewer can avoid that by making up new questions. (OTOH if you're gaming the take-home exam by paying someone else to solve the problem for you, that might be harder to tell.)

krageon · on Jan 11, 2022

Why is that gaming the exam? What sort of professional doesn't look up the solutions to potential problems online, even if it is just to verify that you're correct? Outside of incredibly trivial things, I would expect this of everyone.

kragen · on Jan 11, 2022

I suppose it depends on whether the purpose of the exam is to see if you know how to write working code to solve new problems or how to look up known solutions to well-known problems. Both are valuable skills, but they are definitely not the same skill. Perhaps telling the difference is one reason interviews frequently include in-person programming challenges rather than using take-home exams.

In most cases the right way to do a binary search is not to copy and paste a binary search implementation from Stack Overflow (https://stackoverflow.com/a/41956372 is similar to the formulation I gave above), but to call a binary-search library function. If calling a library function isn't the solution the interviewer is looking for, probably they wouldn't be satisfied with you searching for it on Stack Overflow either.

disgruntledphd2 · on Jan 10, 2022

Facebook Product Data Science has always been a Product Analyst role more than anything else. I did the interviews a while back, and it was a pretty fun experience, but it's not what a lot of people call data science.

jstx1 · on Jan 10, 2022

> but it's not what a lot of people call data science

I think that's changed a bit over time and the term has expanded to mean more things. In addition to Facebook, another great example is this article from Lyft in 2018 where they say that they're renaming all their data analysts to data scientists and all their data scientists to research scientists - https://medium.com/@chamandy/whats-in-a-name-ce42f419d16c

disgruntledphd2 · on Jan 11, 2022

This is called title inflation.

Like the hilarious thing about Facebook and Data Science is that the term was invented there, and they needed to retitle all of their analysts (like Product Data Science) as they couldn't hire any analytical people with an analyst title in SV (or so I have been told).

Like, data science was defined back in the days as a social science PhD who could run experiments and write MapReduce jobs. I'm pretty sure that most people would disagree with this definition these days.

jstx1 · on Jan 11, 2022

Yes. My point is that the person running experiments and mapreduce jobs is still called a data scientist. But so is the product analytics person (btw product data scientists run experiments too). And there are some other data scientist job profiles too (more focus on research, more focus on engineering etc). So it's not really a complete redefinition of the term, it's more of an expansion of the types of jobs it covers.

kevinventullo · on Jan 10, 2022

In my experience, it varied greatly from team to team.

nerdponx · on Jan 10, 2022

I had an "implement binary search" interview once. I came away feeling like I was being interviewed for the wrong role. I don't understand how anyone could think that's an appropriate interview task for a DS position.

whimsicalism · on Jan 10, 2022

I'm an MLE and I get asked much harder questions than that. Implement a binary search seems ... fine?

cloverich · on Jan 10, 2022

Implementing anything even a little tricky under pressure can be tough. unless you’ve practiced with bit or pointer twiddling regularly, you are mostly validating whether they did interview prep or not. That probably selects for more serious candidates, so it probably works. But i was tripped up by a simple binary search problem the other day, even after i’d just solved several harder problems quite quickly. It’s just the nature of algorithmic problem solving — until you’ve done a lot of prep, it’s dicey whether a novel problem will take me five minutes or five hours to solve.

nerdponx · on Jan 10, 2022

But it makes sense for MLE! IMO you should ask a stats or probability question in a DS interview.

jstx1 · on Jan 10, 2022

The distinction between the two roles isn't that clear. Some data science jobs are very focused on engineering.

fault1 · on Jan 10, 2022

Agreed. MLE in very ML-heavy companies tends to mean SWE who work on ML systems, and sometimes, that can mean as much working on stuff like infrastructure as modeling.