wu :: forums (http://www.ocf.berkeley.edu/~wwu/cgi-bin/yabb/YaBB.cgi)
riddles >> cs >> Deleting last character...
(Message started by: TN on Feb 5th, 2004, 2:31pm)

Title: Deleting last character...
Post by TN on Feb 5th, 2004, 2:31pm
A bad programmer has decided to pick the following encoding for the company's text editor:
- If the character is a standard character, it will use 1 byte with the value from 0-127 (7 used bits and 1 unused bit).
- If the character is an extended one, it will use 2 bytes with the value of the first byte from 128-255, and the value of the second byte from 0-255.
Now receiving a backspace, he has a hard time to determine how many bytes (one or two bytes?) he should delete from the end of the input stream. Please help your poor programmer keep his job as he got dozen mouths to feed at home by giving him an algorithm to determine how many bytes he should delete from the end of the input stream when a backspace is hit.

Title: Re: Deleting last character...
Post by towr on Feb 5th, 2004, 2:47pm
::[hide]remove the last byte, and the byte before itt iff the value is greater than 127 [/hide]::

Title: Re: Deleting last character...
Post by John_Gaughan on Feb 5th, 2004, 7:51pm
This problem is very similar to UTF-8 encoding: RFC 3629 (http://www.faqs.org/rfcs/rfc3629.html)

Title: Re: Deleting last character...
Post by TN on Feb 5th, 2004, 8:32pm
If the last value is greater than 127, it won't be a problem. The problem occurs when the last byte has the value in the range of 0-127. How can you help the programmer determine how many byte(s) he should delete?

Title: Re: Deleting last character...
Post by TN on Feb 5th, 2004, 8:42pm
No, this problem is nothing like UTF-8 encoding. If UTF-8 encoding were remotely close to the aforementioned encoding, it would be trashed before it even got its name.

Title: Re: Deleting last character...
Post by TN on Feb 5th, 2004, 8:59pm
Gentlemen, I guess I didn't make it clear that the input stream has intermixed standard and extended characters. Discuss ....

Title: Re: Deleting last character...
Post by towr on Feb 6th, 2004, 1:22am
hmmm.. you're right, this is more difficult then I first thought..
It's seems you have to backtrack to the last character < 128.
Because if the before last character > 127 it might be paired with the one before it, which if > 127 might actually be paired with the one before it, etc.
So you need to delete either one or two characters, depending on if the string of characters after the last <128 one is of odd or even length respectively.

Title: Re: Deleting last character...
Post by John_Gaughan on Feb 6th, 2004, 6:19am

on 02/05/04 at 20:42:51, TN wrote:
No, this problem is nothing like UTF-8 encoding.

UTF encoding is a variable-length encoding, be it UTF-8 or UTF-16. This has everything to do with this problem since this problem also describes a variable-length encoding scheme. The UTF varieties indicate character length a little differently that in this problem, but the basic idea is the same.



Powered by YaBB 1 Gold - SP 1.4!
Forum software copyright © 2000-2004 Yet another Bulletin Board