wu :: forums (http://www.ocf.berkeley.edu/~wwu/cgi-bin/yabb/YaBB.cgi)
riddles >> cs >> Strings permutation
(Message started by: priyank on May 14th, 2004, 7:00am)

Title: Strings permutation
Post by priyank on May 14th, 2004, 7:00am
This one goes like this:
In this problem, we seek to locate permutations of one string which appear as sub-strings of another string. A permutation of a string is a string of the same length which can be obtained by rearranging the characters of the original string. Of course, I don’thave to tell you that the number of permutations of a long string can be enormous, and that there is a time limit on programs being judged.
link::
www.csupomona.edu/~carich/programming_contests/199601/restrung.pdf


I tried to search if it already exists on the forum, but couldn't find it.

Title: Re: Strings permutation
Post by towr on May 14th, 2004, 7:37am
I don't think this problem was on the forum yet, but I do remember it from my string algorithm course a few years back where we had to do it as homework.

hint/comment ::[hide]If both strings are from a fixed/known size alphabet (which is generally the case), and the search string has length m and the text string length n, then you can solve it in O(n + m) time, and O(1) space
Importantly, the number of permutations the search string can have doesn't factor into it

It would be marginally slower if the size of the alphabet isn't fixed (and even then, only the alphabet size of the search string really matters)

note, if you want to output the substring (instead of only its position), complexity naturally increases by a factor m in the worst case.[/hide]::

Title: Re: Strings permutation
Post by TenaliRaman on May 14th, 2004, 11:00am
towr,
i have an algorithm in my mind which would take O(mn) time and O(m) space, but it can print the permutation substring. So i think what i have in my mind is a totally different idea than yours.

Any pointers to your method?

Title: Re: Strings permutation
Post by towr on May 14th, 2004, 11:33am
It's hard to think of another hint that wouldn't give the whole thing away.
::[hide]Consider how you can check if one string is a permutation of another of the same length n, in O(n) time and O(|alphabet|) space
[/hide]::

Title: Re: Strings permutation
Post by TenaliRaman on May 14th, 2004, 1:25pm
i think i got it!
Just tell me if i am right

::[hide]
for simplicity,
i will consider my initial set to be
A={1,2,3,4,5,....,n}
let B = some permutation of set A then
flag = 1;
counter=0;
while(counter<#A)
{
   if(A[counter]==B[counter] || A[B[counter]-1]==B[counter])
       counter++;
   else
   {
       flag=0;break;
   }
}
if(flag)
print("yes permutation");
else
print("no not a permutation");

/*note to readers : this is not the actual program for the given question but yes there is a way to convert this code to get the required result.*/
[/hide]::

Title: Re: Strings permutation
Post by towr on May 14th, 2004, 2:11pm
I don't think that would work, if B = {1,1,1,...,1} it would still say it's a permutation of {1,2,3,..,n}
(An aside, it is inacurate to talk about sets here, as order doesn't matter to sets in the first place. Which is actually a sort of hint.)

Title: Re: Strings permutation
Post by priyank on May 15th, 2004, 3:08am
One method would be to make all the permutations of the search string and then search for it in the another string.But, this doesn't seems to be an efficient one.Is there any another method to do it?Can someone provide some algo for it?
-thanks

Title: Re: Strings permutation
Post by priyank on May 15th, 2004, 3:40am
I think i got it:
http://homepage.mac.com/hteric/LT/SearchAlg/

Title: Re: Strings permutation
Post by TenaliRaman on May 15th, 2004, 5:49am
towr,
thx for pointing that out.
but i don't think i would mind a few spurious results if this method works efficiently enough. :D I am still thinking on your method though.

priyank,
the sorting method pointed to in your link is exactly the method i had in my mind.(Note that max(mn,mnlogm)=mn so time complexity is around O(mn) and to store the sorted value the space taken is O(m)).Obviously this is the most easiest method to be approached.

i haven't checked the hashing formula they have proposed.I will check into it later, and if it works then its pretty darn neat too).

but i am still overawed by the complexity proposed by towr O(m+n) time and O(1) space  :o

Title: Re: Strings permutation
Post by Barukh on May 15th, 2004, 9:30am
Nice problem, priyank!  :D

I think the following code meets the towr’s requirements (for the fixed size alphabet). I’ve made an assumption that ‘?’ sign is is not a part of the alphabet.

I am not going to comment on this algorithm for a while…   ;D


Code:
#include <stdio.h>
#include <stdlib.h>

#define symbol(s, i) (((s[i] == ex) || (count[s[i]] == m+1)) ? ex : s[i])

char c, ex = '?';

main(int argc, char** argv)
{
 char* s = argv[1];
 char* S = argv[2];

 int m = strlen(s);
 int n = strlen(S);

 int i, C = 0;
 int count[256];  /* O(1) space */

 /* O(m) time */
 for (i = 0; i < 256; i++) count[i] = m+1;
 for (i = 0; i < m; i++)
 {
   if (count[s[i]] == m+1) { C++; count[s[i]] = 0; }
   count[s[i]]++;
 }

 /* O(n) time */
 for (i = 0; i < n; i++)
 {
   if (i - m >= 0) { c = symbol(S, i-m); if (! count[c]++) C++; if (! count[c]) C--; }

   c = symbol(S, i); if (! count[c]--) C++; if (! count[c]) C--;

   if (C == 0) printf("%d\n", i-m+1);
 }
}


Title: Re: Strings permutation
Post by towr on May 15th, 2004, 1:27pm
yep, I think that's it.. (It differs on a few details from what I had, but the essence seems to be the same)

Title: Re: Strings permutation
Post by TenaliRaman on May 16th, 2004, 2:03am
Wonderful!! it works!! and i don't know why it works!!! i will have to work out the details step by step now so that i can see what is happening!!  :o

(*_*)o0( O(m+n) time and O(1) space.... wth!)

Title: Re: Strings permutation
Post by towr on May 16th, 2004, 4:51am
isn't
#define symbol(s, i) (((s[i] == ex) || (count[s[i] == m+1)) ? ex : s[i])
equivalent to
#define symbol(s, i) ((count[s[i] == m+1) ? ex : s[i])
that seems a bit simpler  ::)

I'm still trying to figure out what ex is used for in the first place..

I'm not sure if my code is any easier to understand though. (And it doesn't exactly help that it's two years old, I'm not quite sure why this specific variation of the algorithm works anymore..)

Code:
#include <iostream>

int main(int argc, char **argv)
{
 char const *search_string = argc > 1 ? argv[1] : "aabc";
 char const *text          = argc > 2 ? argv[2] : "abahgacabababahcacabahaab";

 int length, count[256] = {0};
 for(length=0; search_string[length]; length++)
   count[search_string[length]]--; // amount of each character we need to find

 unsigned left=0, right=0;
 for(count[text[right]]++; text[right]; count[text[++right]]++)
   while (count[text[right]] > 0)
   {
     if (right-left == length)
       std::cout << (left+1) << std::endl;
     count[text[left++]]--;
   }        // despite apparantly two loops, still only O(n)
 if (right-left >= length)
   std::cout << (right - length+1) << std::endl;
}

Title: Re: Strings permutation
Post by TenaliRaman on May 16th, 2004, 5:38am
That is just great barukh and towr!!
i just got the logic of the entire code....

if i had implemented it i would have still done it in O(mn) since i am sure i would have reset the count array manually instead of using the look-back technique used by u guys!!

Title: Re: Strings permutation
Post by Barukh on May 17th, 2004, 8:58am

on 05/16/04 at 04:51:18, towr wrote:
isn't
#define symbol(s, i) (((s[i] == ex) || (count[s[i] == m+1)) ? ex : s[i])
equivalent to
#define symbol(s, i) ((count[s[i] == m+1) ? ex : s[i])
that seems a bit simpler  ::)

Yes, of  course! Moreover, I believe my code may be improved a lot more…


Quote:
I'm still trying to figure out what ex is used for in the first place..

I use ex as a single representative of all the symbols that are in the S ("search string"), but not in s ("text"). For the fixed size alphabet it's not really helpful, but in the general case it would (IMHO).

As there was an interest in explaining the algorithm, here's it goes:

The first part (time O(m)) forms the array count[], where count[c] is the number of occurrences of symbol c in the string s. It also calculates C, the number of different symbols in s. If c is not in s, its count is set to m+1.

The second part (time O(n)) scans the string S. Once the symbol enters the "window" of m symbols, its count is decreased. When it leaves the "window" (i.e. is m symbols away from the current symbol), its count is increased. Every time count[c] changes from [pm]1 to 0, C is decreased; every time count[c] changes from 0 to [pm]1, C is increased.

When C becomes 0, that means the current window is a permutation of s.

Title: Re: Strings permutation
Post by priyank on May 18th, 2004, 11:59pm
I think we can implement a much simpler algo:
-get the sum S of the ascii code of all the character of s(the first string,length m)
-compute the sum-M[i] of ascii of the first m chars of second search string.
-this summing loop(i) would go from M[0] till M[n-m](n-length of the second string)
-just check if S==M[i]

-Also, we need not do summation M[i] in a loop as M[i+1]=M[i]+B[i+m]-B[i], where B[i] is the ith element of second string.Only for first M[0], we need the loop.

This algo can be found at :http://homepage.mac.com/hteric/LT/SearchAlg/

Title: Re: Strings permutation
Post by towr on May 19th, 2004, 1:12am

on 05/18/04 at 23:59:07, priyank wrote:
-get the sum S of the ascii code of all the character of s(the first string,length m)
I don't think this will work
'b'+'b' = 'a'+'c', you loose critical information.

If you use hashing, you will have to check wether you have an actual substring-match each time you have a hash-match, and in the worst case this will be O(n2). Only when the searchstring is known to be very infrequent may this be a good idea.
And even then it probably isn't :P

[e]hmm the algorithm on that site is worse than linear even..
Their 'CheckPerm()' function is allready O(m log(m) ), and then they still have to multiply it by n when they go through the text[/e]

Title: Re: Strings permutation
Post by priyank on May 19th, 2004, 1:25am
oh!! i missed that..but, this points to another question..can we have a hashing that takes care of 2b!=a+c, so that we dont have to actually check for the substring?i mean some sort of hashing algo..

P.S:i'm not using any checkPerm() method in my last update.

Title: Re: Strings permutation
Post by towr on May 19th, 2004, 1:53am
hashes are usually lossy
of course an integer is much larger than a character, so perhaps we have enough space. There is another problem though. The hash of a string must be the equal to the hash of any permutation of it, so it must be some commutative function on the characters (or functions depending on one character)

Title: Re: Strings permutation
Post by towr on May 19th, 2004, 2:06am
How about this, we associate a prime number with each character, and multiply them together. Of course there's a problem with integer division, so for the textstring we'd put the character-hash in another integer, and simply compare them each step.
If we only look at letters and ignore case we can in the worst case use 4 letters in our searchstring :P
3 if we also have case and numbers  ::)
Anymore might work but is no longer guaranteed to give a unique hash

hmm.. why is this worse than just putting the characters (1 byte each) into the integer (4 bytes)?



Powered by YaBB 1 Gold - SP 1.4!
Forum software copyright © 2000-2004 Yet another Bulletin Board