Author |
Topic: CRC32 interview challenge (Read 1756 times) |
|
Leo Broukhis
Senior Riddler
Gender:
Posts: 459
|
|
CRC32 interview challenge
« on: Jun 13th, 2003, 5:38pm » |
Quote Modify
|
A candidate is given access to a computer that has a compiler/interpreter of the candidate's favorite programming language, an implementation of CRC32 in that language, no network access, no data files (in particular, no word lists like /usr/share/dict/words on Unux systems) and is asked to produce two different, short, easy to remember [to a lay person] strings that have the same CRC32 in approx. 30 minutes. What should the candidate do?
|
|
IP Logged |
|
|
|
towr
wu::riddles Moderator Uberpuzzler
Some people are average, some are just mean.
Gender:
Posts: 13730
|
|
Re: CRC32 interview challenge
« Reply #1 on: Jun 15th, 2003, 7:43am » |
Quote Modify
|
short? well just try all 6-8 letter strings should be doable in half an hour. Maybe a hash to easy find a double. And perhaps a constraint on the string form (like consonant, vowel, consonant, consonant, vowel, consonant)
|
|
IP Logged |
Wikipedia, Google, Mathworld, Integer sequence DB
|
|
|
Leo Broukhis
Senior Riddler
Gender:
Posts: 459
|
|
Re: CRC32 interview challenge
« Reply #2 on: Jun 15th, 2003, 8:40am » |
Quote Modify
|
There are two problems with this: first, I wouldn't say that pseudo-words are easily remembered by laypeople, let alone arbitrary letter sequences and second, the number of 8-letter strings with a constraint (e.g. of the form CWCCWCWC) is less than 12% of 232, the number of possible values of CRC32. Hint 1 (or rather a clarification): a string is easy to remember when it consists of known words, preferably forming a phrase. Hint 2: At the end of 30 minutes, it is not a good idea to ask the interwiewer to wait until the program finishes running for more than a few seconds. Therefore, the program must terminate, say, within one minute. And, if it does not succeed in finding a pair, an estimation of its chances of not finding a pair would be in order. Hint 3: needless to say, I have a few such pairs.
|
|
IP Logged |
|
|
|
towr
wu::riddles Moderator Uberpuzzler
Some people are average, some are just mean.
Gender:
Posts: 13730
|
|
Re: CRC32 interview challenge
« Reply #3 on: Jun 15th, 2003, 2:16pm » |
Quote Modify
|
hmm... I read the problem wrong. Also I don't know what crc32 is..
|
|
IP Logged |
Wikipedia, Google, Mathworld, Integer sequence DB
|
|
|
Leo Broukhis
Senior Riddler
Gender:
Posts: 459
|
|
Re: CRC32 interview challenge
« Reply #4 on: Jun 15th, 2003, 3:19pm » |
Quote Modify
|
CRC32 is a 32-bit Cyclic Redundancy Check, computed by dividing a polynomial represented by the given string, by a particular (I don't remember offhand what its value is exactly) irreducible polynomial of power 32. It is used as an integrity check in data compressors, e.g. ZIP (WinZip, GZip, etc.).
|
|
IP Logged |
|
|
|
BNC
Uberpuzzler
Gender:
Posts: 1732
|
|
Re: CRC32 interview challenge
« Reply #5 on: Jun 15th, 2003, 11:41pm » |
Quote Modify
|
CRC stands for Cyclic Redundancy Check. The 32 comes from the fact it calculates a 32-bit checksum. CRC32 is an algorhytm for calculating a unique identifier for a file. It's used in programs like PKZIP to identify files and make sure that they are original. (there's 1 in 2^32 chance of two files getting the same CRC32 checksum and being mistaken as being the same file). CRC's have the interesting property that they will detect ALL single burst error events of their size or less regardless of location. http://www.codeproject.com/cpp/crc32.asp http://forth.sourceforge.net/algorithm/crc32/index.html
|
|
IP Logged |
How about supercalifragilisticexpialidociouspuzzler [Towr, 2007]
|
|
|
Leo Broukhis
Senior Riddler
Gender:
Posts: 459
|
|
Re: CRC32 interview challenge
« Reply #6 on: Jun 17th, 2003, 4:41pm » |
Quote Modify
|
A hint: Birthday Paradox.
|
|
IP Logged |
|
|
|
xlh
Guest
|
|
Re: CRC32 interview challenge
« Reply #7 on: Jun 24th, 2003, 8:52pm » |
Quote Modify
Remove
|
I think towr's answer is pretty good. If we count 21 consonants and 5 vowels we have over 222 6 letter words of the form (CVCCVC) and the words would be fairly easy to remember because they would be just 2 pronouncable syllables each. I guess the answer you're gunning for would be that you type in small sets of words that form a sentence like [firstname] [lastname] got a [color][item] for [holiday]. Type in 16 entries for each category and you've got 220 sentences. By Sterling's approximation, the odds of failing on the phrases case are approximately 2.5E-56, and much lower still for the 6 letter strings.
|
|
IP Logged |
|
|
|
Leo Broukhis
Senior Riddler
Gender:
Posts: 459
|
|
Re: CRC32 interview challenge
« Reply #8 on: Jun 24th, 2003, 10:42pm » |
Quote Modify
|
Quote:I think towr's answer is pretty good. If we count 21 consonants and 5 vowels we have over 222 6 letter words of the form (CVCCVC) and the words would be fairly easy to remember because they would be just 2 pronouncable syllables each. |
| This will not work because the CRC32 of most pairs of such strings are not independent for the birthday paradox to work. If I remember correctly, CRC32 is guaranteed not to match on strings that differ by less than 4 characters. I am not sure if 16 entries in each of 5 categories provide enough randomness to appeal to the birthday paradox, but 300 hundred nouns and 300 adjectives seem enough. To wit: "electric vest" and "salty bus", "motorcycle space" and "pie Sunday", "file bread" and "son law".
|
|
IP Logged |
|
|
|
Grimbal
wu::riddles Moderator Uberpuzzler
Gender:
Posts: 7527
|
|
Re: CRC32 interview challenge
« Reply #9 on: May 29th, 2004, 2:28pm » |
Quote Modify
|
Don't forget that you have no access to a dictionnary. I have a completely different approach, that I tried and works well. In some other thread, there was the question how to detect loops in linked lists. This gave me an idea. I create a function to encode a 32bit number into a string. For example, in the pattern cvccvcvnnn. Then I start with a randon string str0. I compute crc0 = crc(str0) and str1 = encode(crc0), crc1 = crc(str1) and str2 = encode(crc1), etc. Then I use the clever algorithm to detect loops without the need to store all elements. This happens when a string in the sequence happens to have the same CRC as a previous string. Then I repeat the method to have multiple pairs to choose of. Using this method, I found: crc(rikteja44) = crc(josloza41) crc(pykkety81) = crc(rogwopo21) or with pattern cvcv...nn crc(jeromim83) = crc(wyfuny73) The rest is a matter of designing an encoding function that gives more meaningful words.
|
|
IP Logged |
|
|
|
Leo Broukhis
Senior Riddler
Gender:
Posts: 459
|
|
Re: CRC32 interview challenge
« Reply #10 on: May 29th, 2004, 5:00pm » |
Quote Modify
|
on May 29th, 2004, 2:28pm, grimbal wrote:Don't forget that you have no access to a dictionnary. |
| I don't. How long would it take to come with a list af a few hundred common words? A few minutes at most. Quote: I have a completely different approach, that I tried and works well. In some other thread, there was the question how to detect loops in linked lists. This gave me an idea. |
| www.md5crk.com could be of interest to you. They use the Pollard rho method as well. Quote: The rest is a matter of designing an encoding function that gives more meaningful words. |
| True, but it might be hard within the time limit. Although a method used in the some authentication schemes where a common word encodes each byte would work.
|
|
IP Logged |
|
|
|
Grimbal
wu::riddles Moderator Uberpuzzler
Gender:
Posts: 7527
|
|
Re: CRC32 interview challenge
« Reply #11 on: May 31st, 2004, 7:39am » |
Quote Modify
|
on May 29th, 2004, 5:00pm, Leonid Broukhis wrote: I don't. How long would it take to come with a list af a few hundred common words? A few minutes at most. www.md5crk.com could be of interest to you. They use the Pollard rho method as well. |
| I see. Then, inded, using the birthday paradox and storing the results will bring shorter strings than with loop detection. If I want to do that with loop detection, it is necessary that the string encodes completely the CRC. That means that with a 256-word dictionnary, I still need 4 words. The loop detection method is just faster and easier to code because no storage and lookup is necessary.
|
|
IP Logged |
|
|
|
|