wu :: forums
« wu :: forums - RegEx: ab, ba »

Welcome, Guest. Please Login or Register.
Apr 24th, 2024, 11:55am

RIDDLES SITE WRITE MATH! Home Home Help Help Search Search Members Members Login Login Register Register
   wu :: forums
   riddles
   cs
(Moderators: towr, Grimbal, SMQ, william wu, Eigenray, ThudnBlunder, Icarus)
   RegEx: ab, ba
« Previous topic | Next topic »
Pages: 1  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print
   Author  Topic: RegEx: ab, ba  (Read 7024 times)
william wu
wu::riddles Administrator
*****





   
WWW

Gender: male
Posts: 1291
RegEx: ab, ba  
« on: Feb 24th, 2003, 2:39pm »
Quote Quote Modify Modify

Write a regular expression (or equivalently, construct a deterministic finite automaton) that accepts strings over the alphabet { a, b } in which the number of occurrences of 'ab' = number of occurrences of 'ba'.  
 


 
Note 1: What's a regular expression? Basically, it matches character strings that fit a certain pattern. You are restricted to the operators OR, CONCATENATE, and Kleene Star. And you can use all the parentheses you want. http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=regul ar+expressions
 
Note 2: This problem often trips people up because regular expressions are not supposed to have unbounded counting ability. There's a one-to-one correspondence between a regular expression and finite state machines, and the ability to count arbitrarily high requires an unbounded amount of state space. It follows that you cannot make a regular expression that matches all 0-1 strings of the form 0n1n, where n is an integer, and the notation 0n refers to the string of n consecutive zeroes. However, in this case there's something slightly cunning you can observe which will allow you to realize a Regular Expression.
« Last Edit: Feb 24th, 2003, 2:41pm by william wu » IP Logged


[ wu ] : http://wuriddles.com / http://forums.wuriddles.com
Icarus
wu::riddles Moderator
Uberpuzzler
*****



Boldly going where even angels fear to tread.

   


Gender: male
Posts: 4863
Re: RegEx: ab, ba  
« Reply #1 on: Feb 24th, 2003, 7:26pm »
Quote Quote Modify Modify

I am not particularly knowledgable about CS topics so I usually leave this forum alone, but this one touches on something I use all the time!
 
A couple of questions though:
 
Is the "Kleene star" * the star in most shells (matches any pattern), or the star of most UNIX tools, PERL, etc (matches the preceding pattern any number of times)?
 
When you say that only the Kleene star, OR and CONCATENATE operators are allowed, are you also disallowing anchors? Or is the RE always required to match the entire pattern, not just some part of it?
 
If I can force the RE to match the entire pattern, then I know the solution.
IP Logged

"Pi goes on and on and on ...
And e is just as cursed.
I wonder: Which is larger
When their digits are reversed? " - Anonymous
Jeremiah Smith
Full Member
***



Beep!

   


Posts: 172
Re: RegEx: ab, ba  
« Reply #2 on: Feb 24th, 2003, 8:46pm »
Quote Quote Modify Modify

on Feb 24th, 2003, 7:26pm, Icarus wrote:
Is the "Kleene star" * the star in most shells (matches any pattern), or the star of most UNIX tools, PERL, etc (matches the preceding pattern any number of times)?

 
The Perl one (matches the preceding pattern any number of times).
IP Logged
william wu
wu::riddles Administrator
*****





   
WWW

Gender: male
Posts: 1291
Re: RegEx: ab, ba  
« Reply #3 on: Feb 24th, 2003, 10:21pm »
Quote Quote Modify Modify

I should probably add that by definition, a string is a finite sequence of characters.  
 
on Feb 24th, 2003, 7:26pm, Icarus wrote:

Is the "Kleene star" * the star in most shells (matches any pattern), or the star of most UNIX tools, PERL, etc (matches the preceding pattern any number of times)?
 
When you say that only the Kleene star, OR and CONCATENATE operators are allowed, are you also disallowing anchors? Or is the RE always required to match the entire pattern, not just some part of it?

 
> Kleene star = Matches the preceding pattern any number of times.
> No anchors are needed ... I don't see how they apply here. I presume by anchors you mean the operators ^ $ \b \B, for beginning and ends of strings, and word boundaries. Since the problem statement says the alphabet consists only of the letters a and b, there are no space characters to create word boundaries.  
 
So basically the universe of strings you have to worry about are strings of a's and b's. If the number of occurrences of ab in the whole string is equal to the number of occurrences of ba, your RegEx should accept that string.
« Last Edit: Feb 24th, 2003, 10:21pm by william wu » IP Logged


[ wu ] : http://wuriddles.com / http://forums.wuriddles.com
william wu
wu::riddles Administrator
*****





   
WWW

Gender: male
Posts: 1291
Re: RegEx: ab, ba  
« Reply #4 on: Feb 24th, 2003, 10:25pm »
Quote Quote Modify Modify

Follow up Question:
 
Prove that it is impossible to write a regular expression (or equivalently, construct a deterministic finite automaton) that accepts strings over the alphabet { a, b, c } in which the number of occurrences of 'ab' = number of occurrences of 'ba'.  
 
 
Note: This might be too hard to do if you haven't heard of something in CS theory called the pumping lemma, but it'd be interesting to see if an unacquainted mind ends up rediscovering it while thinking about this problem.
IP Logged


[ wu ] : http://wuriddles.com / http://forums.wuriddles.com
James Fingas
Uberpuzzler
*****





   
Email

Gender: male
Posts: 949
Re: RegEx: ab, ba  
« Reply #5 on: Feb 25th, 2003, 9:01am »
Quote Quote Modify Modify

Original Problem:
The only reason you can solve the problem when the alphabet is {a,b} is that the occurrances of ab and the occurrances of ba must alternate. The regexp is simple:
 
a*a or b*b

 
However, in the case where the language is {a,b,c}, then your automaton must be able to count indefinitely high (and no finite automaton can do this).
 
Proof:
Consider the string abcabcabc ... cbacbacba, where there are m repetitions of "abc" and n repetitions of "cba". When the automaton has received all the "abc"s, it must be able to wait for m repetitions of "cba", accept the string if it ends there, or reject the string if it doesn't. As each of these m "cba"s passes, it must enter a different state (this should be obvious). However, there is no limit to how high m can be, so there is no limit to the number of states that such a successful automaton would need (ie. a successful automaton would not be finite).
IP Logged

Doc, I'm addicted to advice! What should I do?
Icarus
wu::riddles Moderator
Uberpuzzler
*****



Boldly going where even angels fear to tread.

   


Gender: male
Posts: 4863
Re: RegEx: ab, ba  
« Reply #6 on: Feb 25th, 2003, 3:50pm »
Quote Quote Modify Modify

Given the definition of the Kleene Star, the solution would have to be (a(a|b)*a)|(b(a|b)*b)    ( | = OR ).
 
As for the need for anchors: most implementations I know of, this will still match "abab" since there is nothing to require it to match the whole string. It will match the "aba" portion and ignore the rest. I don't see how this can be solved without some means of forcing a full string match. Either by (1) having your REs always required to match the whole string, or (2) allowing anchors: ^(a(a|b)*a)|(b(a|b)*b)$
IP Logged

"Pi goes on and on and on ...
And e is just as cursed.
I wonder: Which is larger
When their digits are reversed? " - Anonymous
TimMann
Senior Riddler
****






   
WWW

Gender: male
Posts: 330
Re: RegEx: ab, ba  
« Reply #7 on: Feb 26th, 2003, 12:18am »
Quote Quote Modify Modify

In the theory of regular expressions, the language defined by a regular expression is the set of strings that match the regular expression. That is, the whole string matches the regular expression, not a substring. William was assuming people knew that for this problem.
 
Grep's regular expressions are an extension of the kind usually used in the theory. One way to look at the "anchoring" issue is that grep gives you an implicit ".*" at the beginning and end of the pattern unless you start it with ^ and/or end it with $. Here "." itself is a grep extension, meaning of course the disjunction of all characters in the alphabet; that is, (a|b|...).
IP Logged

http://tim-mann.org/
towr
wu::riddles Moderator
Uberpuzzler
*****



Some people are average, some are just mean.

   


Gender: male
Posts: 13730
Re: RegEx: ab, ba  
« Reply #8 on: Feb 26th, 2003, 12:36am »
Quote Quote Modify Modify

on Feb 25th, 2003, 3:50pm, Icarus wrote:
Given the definition of the Kleene Star, the solution would have to be (a(a|b)*a)|(b(a|b)*b)    ( | = OR ).

This doesn't match "a", or "b" or "" for that matter..
 
so how about: (a (b+ a+)* | b (a+ b+)* |)
 
here's an interesting link if you like regular expressions:
http://odur.let.rug.nl/~vannoord/fsademo/
watch the syntax though, ab => [a,b] and a|b => {a,b} and the empty string is {}
It also functions as tranducer, in case you want to translate..
IP Logged

Wikipedia, Google, Mathworld, Integer sequence DB
william wu
wu::riddles Administrator
*****





   
WWW

Gender: male
Posts: 1291
Re: RegEx: ab, ba  
« Reply #9 on: Feb 26th, 2003, 12:40am »
Quote Quote Modify Modify

Regarding Icarus's solution: Almost ... forgetting some special cases. For instance, the string "a" has zero occurrences of both 'ab' and 'ba', and thus should be in our language. Likewise, the empty string should be accepted. The empty string is usually referred to in RegEx theory as epsilon, and is another symbol available to us which I forgot to mention.
 
The intuition is right though -- the key observation is every string in the language must start with the same letter it ends with.
« Last Edit: Feb 26th, 2003, 12:46am by william wu » IP Logged


[ wu ] : http://wuriddles.com / http://forums.wuriddles.com
william wu
wu::riddles Administrator
*****





   
WWW

Gender: male
Posts: 1291
Re: RegEx: ab, ba  
« Reply #10 on: Feb 26th, 2003, 12:44am »
Quote Quote Modify Modify

Oops. towr sneaked in while I was posting Smiley
IP Logged


[ wu ] : http://wuriddles.com / http://forums.wuriddles.com
Icarus
wu::riddles Moderator
Uberpuzzler
*****



Boldly going where even angels fear to tread.

   


Gender: male
Posts: 4863
Re: RegEx: ab, ba  
« Reply #11 on: Feb 26th, 2003, 8:03pm »
Quote Quote Modify Modify

on Feb 26th, 2003, 12:18am, TimMann wrote:
In the theory of regular expressions, the language defined by a regular expression is the set of strings that match the regular expression. That is, the whole string matches the regular expression, not a substring. William was assuming people knew that for this problem.

 
As I said, I don't know much CS theory. As far as actual RE implementations go, grep is hardly alone. About the only things I've seen that do require matching the whole string are shells.
IP Logged

"Pi goes on and on and on ...
And e is just as cursed.
I wonder: Which is larger
When their digits are reversed? " - Anonymous
rmsgrey
Uberpuzzler
*****





134688278 134688278   rmsgrey   rmsgrey


Gender: male
Posts: 2872
Re: RegEx: ab, ba  
« Reply #12 on: May 15th, 2003, 9:36pm »
Quote Quote Modify Modify

on Feb 26th, 2003, 12:36am, towr wrote:

(a (b+ a+)* | b (a+ b+)* |)

doesn't work - rejects aaba  
 
quick fix:
(a+(b*a+)*|b+(a*b+)*|e)
using (e) to denote empty string and noting (a+) is shorthand for (aa*) (the original problem ruled out using anything but the three basic operations)
IP Logged
S. Owen
Full Member
***





   


Gender: male
Posts: 221
Re: RegEx: ab, ba  
« Reply #13 on: May 26th, 2003, 7:41am »
Quote Quote Modify Modify

I think that answer is right... I worked out the state machine for this and came up with this alternate regular expression, which may be a little simpler:
 
(   e   |   a ( a | b+ a )*   |   b  ( b | a+ b )*   )
IP Logged
rmsgrey
Uberpuzzler
*****





134688278 134688278   rmsgrey   rmsgrey


Gender: male
Posts: 2872
Re: RegEx: ab, ba  
« Reply #14 on: Aug 14th, 2003, 3:32pm »
Quote Quote Modify Modify

on May 26th, 2003, 7:41am, S. Owen wrote:
I think that answer is right... I worked out the state machine for this and came up with this alternate regular expression, which may be a little simpler:
*'hide'd portion removed*

 
::
(e|a(b*a)*|b(a*b)*)

::
seems simpler still. In fact, I'm going to go out on a limb and say I reckon it's the simplest possible - you need to cover the three cases (e, a..., b...) and in the latter two, you need to ensure that each run of the wrong letter is followed by one of the right one.
IP Logged
Pages: 1  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print

« Previous topic | Next topic »

Powered by YaBB 1 Gold - SP 1.4!
Forum software copyright © 2000-2004 Yet another Bulletin Board