Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For this example he does, he's saying [0, 999] is the possible space, but the distribution in that space is not done uniformly.


Exactly, and if I make a mistake and assume that the possible space is 0 to 3999, it still works! I'll just need a bigger sample to estimate the number of videos with the same precision. (The method does fail if I exclude valid values, e.g. assume a space of 0 to 499).


We know what the possible space is for YouTube URLs.


We assume we know the possible space for YouTube URLs, but it might not be a fair assumption.

Take phone numbers as an analogy. Without foreknowledge or without careful analysis of the distribution of phone numbers, you might assume all numbers are valid, but in fact 555-xxxx is always invalid within each area code [0]. For each set of reserved numbers our address space is that much smaller, which can skew the results of the statistics we gather from it if we don't exclude them from our original calculations.

It may be that YouTube reserves off certain address spaces (eg maybe it can't start with a 0, or maybe two visually similar values cannot be next to each other (eg I and 1), etc), which may make this sampling method (slightly less) accurate than it might otherwise appear.

[0] https://en.wikipedia.org/wiki/555_(telephone_number)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: