wu :: forums (http://www.ocf.berkeley.edu/~wwu/cgi-bin/yabb/YaBB.cgi)
riddles >> cs >> Regex and substring
(Message started by: Barukh on Oct 18th, 2015, 8:53am)

Title: Regex and substring
Post by Barukh on Oct 18th, 2015, 8:53am
A friend of mine asked me the following intriguing question:

Given a regular expression R and a string S.

Devise an algorithm to determine whether there exists another string S' s.t.

1. S is a substring of S'.
2. S' matches R.

Title: Re: Regex and substring
Post by towr on Oct 18th, 2015, 1:11pm
[hide]Build the finite state automaton corresponding to R
Try each state as starting state to check if S can be a substring of any string matching R
Work backwards/forwards from start/end of S to find a fitting S'.[/hide]

Title: Re: Regex and substring
Post by Barukh on Oct 19th, 2015, 9:12am
Nice! But...

So, it all [hide]boils down to efficiently building an FA for the regex[/hide]?

Can it be efficient always?

Title: Re: Regex and substring
Post by playful on Apr 24th, 2016, 3:03am
Here is how I would get started, taking an example.

Bracing myself for a semantic discussion on the meaning of "regular expression". :) The title of the puzzle does say "regex".

Example
1. regex pattern R: \d+@hello
2. string S: hello

Idea for Algorithm
For each position p in the pattern R
...Rp = subpattern from p to end of R
...Temp regex Rt = ''
...For each token t in Rp
......Rt += t
......If Rt matches S
.........Return success
Return failure

Tracing this example:
- Position p=0
-- Try to match "hello" with \d+ => Fail
-- Try to match "hello" with \d+@  => Fail
-- Try to match "hello" with \d+@h  => Fail
...
-- Try to match "hello" with \d+@hello  => Fail

- Position p=1
-- Try to match "hello" with @ => Fail
-- Try to match "hello" with @h  => Fail
...
-- Try to match "hello" with @hello  => Fail

- Position p=2
-- Try to match "hello" with h  => Fail
-- Try to match "hello" with he  => Fail
...
-- Try to match "hello" with hello  => Succeed

Example string S': 123@hello


Where this breaks down
regex R: ^(?!.*hello)\w+
string S: hello

The algorithm returns a success because \w+ matches hello, but R will never match a string containing S.

Title: Re: Regex and substring
Post by Grimbal on Apr 25th, 2016, 6:40am
We could just check whether S matches R+".+" or ".+"+R.
The only problem I see is with ^ and $ anchors.  the ".+" should be inserted after an initial "^" and before a final "$".


PS: oops, possibly correct answer to the wrong question.

Title: Re: Regex and substring
Post by playful on Apr 25th, 2016, 9:45am

on 04/25/16 at 06:40:54, Grimbal wrote:
We could just check whether S matches R+".+" or ".+"+R.


I think the problem statement is the other way around.

S is a substring of S', not the opposite.

In my example above, with S as "hello", manipulating R as you suggest will never match S.



Powered by YaBB 1 Gold - SP 1.4!
Forum software copyright © 2000-2004 Yet another Bulletin Board