wu :: forums (http://www.ocf.berkeley.edu/~wwu/cgi-bin/yabb/YaBB.cgi)
riddles >> cs >> largest common subtree
(Message started by: pirate768 on Jun 29th, 2007, 2:57am)

Title: largest common subtree
Post by pirate768 on Jun 29th, 2007, 2:57am
A subtree of a rooted, ordered binary tree T consists of a node and all its descendants. Write a method to determine the largest common subtree (structurally) of two given binary trees T1 and T2, that is, the largest subtree of T1 that is structurally identical to a subtree in T2. The contents of the nodes are irrelevant; we are only interested in matching the underlying structure of the trees.

Title: Re: largest common subtree
Post by towr on Jun 29th, 2007, 3:11am
[hide]You can reduce it to a largest common 'substring' problem.[/hide]

Title: Re: largest common subtree
Post by pirate768 on Jun 29th, 2007, 4:10am
yeah u r right ...

but the fundtion signature is like this :
struct node{
int number;
node* left;
       node* right;
};


node* find_largest_common_subtree (node* root1, node* root2)

i guess there is some simpler way ?

Title: Re: largest common subtree
Post by labrin on Jun 29th, 2007, 7:40am
from this signature also u can find as towr said:
node* find_largest_common_subtree (node* root1, node* root2)
{         1. Do traversal ( say preorder) using stack and store 'L'      and 'R' for left and right child in two arrays for two trees.
          2. apply any std largest common substring algo.
}

Title: Re: largest common subtree
Post by randome on Jul 13th, 2007, 11:18pm
I think simple recursion can work here...

Code:
bool isEqualTree(root1, root2)
{
 // base case: (we reach end of tree or are given two null nodes)
 
if(root1 ==NULL && root2 ==NULL) return true;
if(root1->left ==NULL && root2->left==NULL && root1->right ==NULL && root2->right==NULL)
  return true;

  if(!root1->left && root2->left ||  root1->left && !root2->left ) return false;

  if(!root1->right && root2->right ||  root1->right && !root2->right ) return false;

return (isEqualTree(root1->right,root2->right) &&isEqualTree(root1->left,root2->left))


}

Title: Re: largest common subtree
Post by sk on Sep 6th, 2007, 3:27pm
simple recursion will incur more time.
the previous one works better.

convert the tree into a pre-order string reprsentation

e.g.
a simple tree with a 3 nodes can be imagined as
         +
      a    b
can be reprsented as (+ab)
So the tree in the above example can be represented as
1st tree: +(+(+ab)b)(+(+a)+b)))
2nd tree: +(a)(+(+(+a)(b)))(+(+ab)b)

Note that +(+ab )b match.
to find this u can now use a prefix tree

Title: Re: largest common subtree
Post by wangzhen on Sep 6th, 2007, 9:37pm
One option:

inorder tranverse the two trees, and calculate number of nodes in each subtree.

Sort these numbers in descending order.

only the subtrees with same numbers have the chance to match.

Title: Re: largest common subtree
Post by goforit on Sep 8th, 2007, 12:32am
What about doing a BFS and at each node checking for equal of the trees...

But each time find the size of the tree and always maintain the pointers for the largest subtree

Title: Re: largest common subtree
Post by johny_cage on Sep 26th, 2007, 6:07pm
@randome
your recursive algorithm assumes both nodes to be at same level for structurally same. but in the example given by pirate768, structure can be same at different levels.

So, the condition u r checking i.e.
if(!root1->left && root2->left ||  root1->left && !root2->left ) return false;
has no useful meaning. Moreover it is wrong to say this. The contradict example is already provided by pirate786.

Correct me if I am wrong.

Thanks.....

Title: Re: largest common subtree
Post by towr on Sep 27th, 2007, 12:58am
I think his code is only meant to do what the function name says: check that two (sub)trees are identical.
So to solve the problem he'd still have to traverse both trees, and then compare every subtree of one with those of the other one.

Title: Re: largest common subtree
Post by m_aakash on Mar 17th, 2008, 11:13am

on 06/29/07 at 03:11:54, towr wrote:
[hide]You can reduce it to a largest common 'substring' problem.[/hide]


what mechanism should be used to convert tree into string(given tree is unlabelled)? how do u label the nodes of binary tree sothat we can apply Longest Common Substring algorithm?

Title: Re: largest common subtree
Post by towr on Mar 17th, 2008, 11:35am

on 03/17/08 at 11:13:16, m_aakash wrote:
what mechanism should be used to convert tree into string(given tree is unlabelled)? how do u label the nodes of binary tree sothat we can apply Longest Common Substring algorithm?
I would use parenthesis to denote which nodes are part of the same subtree.

Like ((1)2((3)4(5))) for the tree

    2  
   / \
  1   4
    / \
   3   5

If the value of nodes doesn't matter, just don't write anything, and only keep the parenthesis structure. (()(()()))

Title: Re: largest common subtree
Post by m_aakash on Mar 17th, 2008, 12:05pm

on 03/17/08 at 11:35:08, towr wrote:
I would use parenthesis to denote which nodes are part of the same subtree.

Like ((1)2((3)4(5))) for the tree

    2  
   / \
  1   4
    / \
   3   5

If the value of nodes doesn't matter, just don't write anything, and only keep the parenthesis structure. (()(()()))


good one. how did i miss this idea?

Title: Re: largest common subtree
Post by Hippo on Mar 17th, 2008, 1:39pm

on 03/17/08 at 11:35:08, towr wrote:
(()(()()))


I am not sure ... probably the longest substring neednot match the biggest subtree.
You must look only for well closed expressions ...

Title: Re: largest common subtree
Post by m_aakash on Mar 17th, 2008, 10:55pm

on 03/17/08 at 13:39:35, Hippo wrote:
I am not sure ... probably the longest substring neednot match the biggest subtree.
You must look only for well closed expressions ...


can you give a counter example or prove that it doesnot work?

Title: Re: largest common subtree
Post by totobogy on Mar 18th, 2008, 4:12am
One thing that needs to be taken care of in using paranthesis to represent structure is the position of single child - whether it is the left or the right child. A node with a single left child is structurally different from a node with a single right child.

A solution would be to explicitly represent the missing children using a flag, say 'B'.

thus ((),B) can be identified as diff from (B,())


Title: Re: largest common subtree
Post by Hippo on Mar 18th, 2008, 7:12am

on 03/18/08 at 04:12:19, totobogy wrote:
One thing that needs to be taken care of in using paranthesis to represent structure is the position of single child - whether it is the left or the right child. A node with a single left child is structurally different from a node with a single right child.

A solution would be to explicitly represent the missing children using a flag, say 'B'.

thus ((),B) can be identified as diff from (B,())


You can use parenteses around pointers instead ... so nil pointer will be represented as ().
So ((()()),()) is different from ((),(()())).


on 03/17/08 at 22:55:49, m_aakash wrote:
can you give a counter example or prove that it doesnot work?


Later if noone else gives ... ;)

Title: Re: largest common subtree
Post by johny_cage on Mar 19th, 2008, 3:21am
whats the time complexity of the above mentioned algorthms, i.e the one with string matching patters?

Can't we lower down it???

Title: Re: largest common subtree
Post by towr on Mar 19th, 2008, 3:58am

on 03/19/08 at 03:21:20, johny_cage wrote:
whats the time complexity of the above mentioned algorthms, i.e the one with string matching patters?
If we can use a "longest common substring" algorithm, then we can do at least O(nm).


Quote:
Can't we lower down it???
We could use an incremental hash, and use that to hash up every subtree in the first tree, then check for every subtree in the other tree whether it has an entry in the hashtable. With a good hash function, it should be O(n+m). In the worst case it'll still be quadratic, though.

Title: Re: largest common subtree
Post by alexeigor on Mar 19th, 2008, 12:32pm

on 03/19/08 at 03:58:11, towr wrote:
If we can use a "longest common substring" algorithm, then we can do at least O(nm).



We can use suffix tree for longest common substring problem and get O(n+m) time.

Title: Re: largest common subtree
Post by Hippo on Mar 19th, 2008, 3:10pm
OK, explicitly:
((()((((()())(()()))(()()))()))((()((()())(()())))()))
and
((()(((((()(()()))(()()))((()())()))()))((()((()(()()))(()())))()))

what is the longest common substring?
Is it ()))()))((()((()? And what is the larest common subtree?
Is it (()())?

(Sorry for not trying to create the smallest counter example ;))

Actually I didn't think much about the algorithm, but my would process "by induction" increasing tree heights and finding equivalence classes of such trees.
Parents of those who are not in the equivalence class shared by both trees neednot be processed so the process can be stopped early. The equivalence class with the most vertices found during the process is the winner.

So start with the DFS and partition (lists) vertices of both trees by their height (length of the longest path to the leaf).
Process the vertices in order of increasing height. Vertices on level 0 are of the same equivalence class 1. Class 0 corresponds to null pointer.

So far algorithm numbered k different equivalence classes and we start processing http://www.ocf.berkeley.edu/~wwu/YaBBImages/symbols/ell.gifth level.
Hash key created by pair (L,R) where L is the equivalence class number of the left son and R is the equivalence class number of the right son. If the pair is already hashed vertex belongs to already existing class (set info that corresponding tree contains vertex in this equivalence class and if both trees contain a vertex in the class, remember that for current level there is common subtree and check the size of the subtree with the current maximal comon subtree size). If the pair haven't been in the hash increment k the number of equivalence classes, store k as the name of the class, size obtained by sum of subtree sizes plus 1 and set the info that the corresponding tree contains vertex in the class). When all http://www.ocf.berkeley.edu/~wwu/YaBBImages/symbols/ell.giflevel vertices were processed and no common subtree was found on this level, process can be terminated. Even without this condition the algorithm hashes O(n) vertices in expected O(n) time. Preprocessing took O(n) time as well.

Actually as the number of leaves is higher then the number of other vertices there is no reason for "premature termination".
Easier algorithm just performs DFS on the first tree and in postorder updates hash records similarly as in the previous algorithm. Now the equivalence classes will not be numbered by levels but this is no problem.
When traversing 2nd tree each pair which is hashed and was hashed by the first tree is candidate for the maximal common subtree. This will cost O(n) as well and does not require preprocessing.

Title: Re: largest common subtree
Post by totobogy on Mar 19th, 2008, 10:35pm
W.r.t your example above, you need to look for Longest Common 'Well-Formed' Substring and not merely Longest substring. (Well-formed here means properly nested parenthesis).

Title: Re: largest common subtree
Post by m_aakash on Mar 19th, 2008, 11:17pm

on 03/17/08 at 11:13:16, m_aakash wrote:
what mechanism should be used to convert tree into string(given tree is unlabelled)? how do u label the nodes of binary tree sothat we can apply Longest Common Substring algorithm?


one way to convert tree into string is to assign a label to each edge in particular assign 0 to left edge and 1 to right edge in case of binary tree. for a null edge assign a different value.

Title: Re: largest common subtree
Post by Hippo on Mar 20th, 2008, 12:31am

on 03/17/08 at 13:39:35, Hippo wrote:
I am not sure ... probably the longest substring neednot match the biggest subtree.
You must look only for well closed expressions ...



on 03/19/08 at 22:35:04, totobogy wrote:
W.r.t your example above, you need to look for Longest Common 'Well-Formed' Substring and not merely Longest substring. (Well-formed here means properly nested parenthesis).



on 03/20/08 at 00:38:43, totobogy wrote:
ohh.. sorry didn't see your comment

No problem ;)

Title: Re: largest common subtree
Post by totobogy on Mar 20th, 2008, 12:38am
ohh.. sorry didn't see your comment



Powered by YaBB 1 Gold - SP 1.4!
Forum software copyright © 2000-2004 Yet another Bulletin Board