```

wu :: forums
(http://www.ocf.berkeley.edu/~wwu/cgi-bin/yabb/YaBB.cgi)

riddles >> hard >> Worm Propagation
(Message started by: william wu on Feb 3rd, 2003, 12:04am)

```

Title: Worm Propagation
Post by william wu on Feb 3rd, 2003, 12:04am
WORM PROPAGATION

A hacker is attacking a computer network. Each computer on the network is connected to various other computers. The hacker releases a worm on a source computer. When a worm infects a computer, that computer can propagate a copy of the worm to a connected computer at a rate of once per second. Suppose the hacker knows the layout of the network; i.e., he knows which computers are connected to which. How should the worm proceed to infect the whole network as quickly as possible?

Also, if possible, describe an algorithm for computing the minimum time till total network infection. (Admittedly a computer science background will help here, but it is not absolutely necessary.)

Note: Writing credits to Yosen Lin. Inspiration was the Sapphire worm that hit the Net last week.

Title: Re: Worm Propagation
Post by towr on Feb 3rd, 2003, 1:30am
...
[hide]Order all connected computers as a tree, with the source computer as the root, and make sure the tree is as shallow as possible. Model it so an infected computer 'reinfects' itself, and another computer, making a binary tree, where each level is one second further in time.
And of course al connection in the tree have to be there 'irl' between the computers.[/hide]...

Title: Re: Worm Propagation
Post by Chronos on Feb 3rd, 2003, 5:58pm
towr:  Where you have [hide]and make sure the tree is as shallow as possible.[/hide], isn't that sort of begging the question?  Does there exist a standard algorithm for doing that, which any computer scientist would know already?  Or did you just intend it as a re-wording of the problem?

Title: Re: Worm Propagation
Post by towr on Feb 4th, 2003, 12:02am

on 02/03/03 at 17:58:02, Chronos wrote:
 Does there exist a standard algorithm for doing that, which any computer scientist would know already?  Or did you just intend it as a re-wording of the problem?
I'm not sure if there is a standard solutuion for doing it in a smart way, since some links just aren't possible and other have to be there..
There is however the way of making every possible tree, and choosing the one that befits the criterium.. Brute-force is usually my first solution, smart solutions follow when it's apparant brute-force would take too long.

There has to be a better way, but I'm not really sure yet what it is..

Title: Re: Worm Propagation
Post by william wu on Feb 4th, 2003, 2:27am
Breadth-first search (BFS) can give you the shallowest possible tree. This basically organizes the vertices according to how many steps away they are from the source vertex.

Yes, there's a cheaper way to do this than expand all possibilities ... Somewhere you need to describe how to choose the next vertex the worm should infect in order to minimize total infection time. This next vertex should satisfy some metric. Drawing some simple example networks should help.

Title: Re: Worm Propagation
Post by wolfgang on Feb 4th, 2003, 8:21am
Coming from a very amateur programmer:
[hide]Create the tree by starting with the host computer and assigning an ID to every computer it's attached to. Move to those computers and assign an ID to each one they're attached to that doesn't already have an ID. Make it a 2 part number such as 2.3 or 2'3 to identify both parent and child. Continue till all computers have an ID. (towards the end I suppose it would speed things up to start with the computers that haven't been IDed and work backwards).
Now to infect. Instruct each computer to search for the longest ID that starts with its own ID, to reach the most distant computers quickly. Continue until all computers it's been assigned have been infected.
Then move to other computers it wasn't assigned, on the assumption that some computers will be off-line or virus protected and they'll break the chain. If you make each infected computer start by sending a message to its parent to let it know the infection was successful, then the next step would be to either retry those that didn't respond or start infecting their children. Those that reach the end of the tree should start by trying to infect their cousins or uncles. If they ever find one that wasn't infected, then move up to its parent and grandparent...
The total time to total infection will be one second for each step to the longest branch, or a couple seconds longer if there are computers off-line or protected.[/hide]

Title: Re: Worm Propagation
Post by wolfgang on Feb 4th, 2003, 8:32am
Obviously, I made one very silly mistake. I really should register so I can undo those mistakes before anybody else sees them, but I keep thinking this will be my very last mistake. Anyway [hide] Please ignore the suggestion that a computer infect its grandchildren in the event the computer it tried to infect doesn't respond. Obviously it has no direct connection to its grandchildren or they wouldn't be grandchildren. Maybe the tree should have double-redundancy or it should start sending out lists of the computers it missed so other computers can see whether they can make the bypass, [/hide]

Title: Re: Worm Propagation
Post by wolfgang on Feb 4th, 2003, 8:53am
Wrong again, but at least I'm registered, so this will be my last post.
[hide]The total time will be the maximum sum of all the address parts for one of the computers at the top of the tree. If a tree is very broad, the worm will spread sideways much more slowly than it spreads up. So the fastest algorithm would be to weigh each branch. Start with the branches that will take the longest, not just the tallest branch. And those branches will have to be sorted in reverse order, sorting the top of the tree first and working your way down. [/hide]

I should have known this would be more complicated than it looked. It's too hard for me so I'll shut up after one more comment. [hide]My tree-creating algorithm is no where near optimal. As soon as A infects B, A and B should be equally regarded to decide which will infect C if it's connected to both. And that decision will be based both on equalizing A's and B's workloads (If A connects 100 computers and B connects 10, then you don't want A wasting time infecting computers that B could handle), as well as the importance or weight of C's vertex. But that means the whole tree's shape may change, so a top-down weight sorting algorithm won't work as the way we figure out C's importance. [/hide]

Title: Re: Worm Propagation
Post by william wu on Feb 25th, 2003, 12:31am

on 02/04/03 at 08:53:39, wolfgang wrote:
 I should have known this would be more complicated than it looked. [/hide]

You're not alone; it turns out our solution was flawed. We're stuck too.

Title: Re: Worm Propagation
Post by william wu on Mar 9th, 2003, 2:13am
I eventually resorted to googling. We have rediscovered what is typically known as the gossip problem. However, my searches were unsuccessful toward finding a general algorithm for minimal gossip times on a graph. One paper I saw found minimal gossip times only for particular graph structures, such as cliques, trees, and lines; however, I only gave it a cursory survey. Anyways, this problem seems far harder than I expected, and is probably NP-hard because of the tradeoff between depth and bushiness.

Title: Re: Worm Propagation
Post by william wu on May 12th, 2003, 12:21am
An interesting e-mail I got from an Edward Rustin regarding this problem. Intuitively it seems right to head for the center of mass, but of course that's not a proof.

Hi,

This isn't exactly a conventional method for solving the network
propagation problem but it does work. Unfortunatly I doubt that an
algorithm or a strictly formal proof can be derived from this. All you
need is a lot of string and a bit of time.

Since our hacker knows the layout of the network already then he can do
this. Bascily we create a network diagram using string. Every computer is
represented by a knot in our string. Whenever two parts of the network
come togeather (say at a router or a gateway machine) we tie two pieces of
string togeather. Make sure that the knots that represent hubs or other
'transparent' pieces of hardware are easy to tell apart from the knots
that represent computers. The only tricky part is to ensure that the
spaceing between knots is consistent.

Once we have our 'string map' of the network its easy to find the 'center'
of the network. pick up the string at any of its loose ends (the loose
ends being computers on the edge of the network) and let the string hang
naturally. The lowest point is the computer 'furthest' away from the
computer that you are holding in your fingers. If you then hold that
lowest knot and pull the two so that the string becomes tight then we've
just mapped out the shortest path between those two computers. Its a
trivial exercise to count the number of computers between these two points
(and, should we wish, map it onto our network diagram). From this we can
work out which computer is centered between those two points.

We now need to perform a sort of 'string recursion' algorithm. Still
holding our bottom knot let go of the top one. This will now show us the
point furthest away from our previous bottom. Take hold of this point and
drop the top one. This, once again, gives us the furthest point away from
that previous point. Repeat untill we oscillate between two points, each
point being the furthest point away from each other. This is the longest
path on our network.

An easier way to visualise this is as follows:

Pick a point on the network - P1
Let the network hang from that point, lowest point is P2
While (Pn != Pn+2) AND (Pn+1 != Pn+3)
Hang network from Pn
Lowpoint is Pn+1

Repeat with Pn+1

Pn, Pn+1 is now the longest path.

This makes it trivial to determine where the network's 'center' is, being
the computer that is in the middle of these two points. This is therefore
the best computer to infect first. The propagation time is thus the
ceiling of Distance(P1, P2)/2.

This makes it trivial to determin where the network's 'center' is, being
the computer that is in the middle of these two points. This is therefore
the best computer to infect first. The propagation time is thus the
ceiling of Distance(P1, P2)/2.

The best propagation tactic for the worm is for each computer to propagate
first to the computers on this line between the two points and then on
subsequent ticks to propagate to any other uninffected computer that it is
connected to. I think that's so at least.

Even if that isn't true, my string computer should go someway to solving
this problem.

Edward Rustin

Title: Re: Worm Propagation
Post by towr on May 12th, 2003, 1:06am
I'm not sure it will work, because the 'distance' (time) between two computers depends on the previous steps taken by the worm.
From one computer the worm can only infect one other computer per second, so all computers connected to that computer are at a different distance which you can't determine before you determine the order in which you infect them.

Title: Re: Worm Propagation
Post by TenaliRaman on Nov 2nd, 2003, 8:57am
::[hide]
Why not simply apply Djikstra Algorithm (from source node to every node) and then construct a tree whose root node is the source node.
[/hide]::

Title: Re: Worm Propagation
Post by towr on Nov 2nd, 2003, 9:48am
Could you elaborate on how that would work?

Title: Re: Worm Propagation
Post by TenaliRaman on Nov 2nd, 2003, 11:50am
Pretty simple (i think!)

Construct the graph of the network. Assign each edge of the graph with weight 1 (we simply are interested in minimising the hopcounts here). Now apply Djikstra algorithm from source node to every other node and build a tree.

Consider this example as an illustration,

Title: Re: Worm Propagation
Post by towr on Nov 2nd, 2003, 12:19pm
Ah..
So basicly it won't work here, imo.. The distance between nodes depends on the order childnodes are chosen.. (because the worm can only be send to one node every second)

Title: Re: Worm Propagation
Post by TenaliRaman on Nov 2nd, 2003, 12:32pm
In the simple graph i proposed above, the time taken for worm propagation is not much significant.

However if my idea is to produce a n-level tree with m childnodes for each node on an average then it can achieve significant time minimisation.

Title: Re: Worm Propagation
Post by mindcrime on Jul 28th, 2004, 10:28am
Well guys, I am new to this field but let me exercise my feeble mind over this topic.

The hacker could create a minimal spanning tree from the network and start the attack from the root. The minimal spanning tree could be the traditional one as we all know, but it should also be "minimal" in another sense wherever possible. The other sense is that as far as possible , the weights of the edges should be nondecreasing. Clearly, by choosing the appropriate vertex which satisfies this property in any spanning tree as the root, this is possible. So the solution can be very crisply stated as follows...

"Find the minimal spanning tree and start from the vertex from which every edge is ordered to the leaves in non-decreasing order of their weights".  ::)

Title: Re: Worm Propagation
Post by John_Gaughan on Sep 2nd, 2004, 10:49am
Minimal spanning tree may be a good start but will not, by itself, yield a solution. Keep in mind that time is a factor, not just location on the network. If a node has ten children, it will infect them, one a second, until it is done. This introduces complexity to the algorithm that, as William said, probably makes it NP-hard. That and the fact that we do not know the network layout ahead of time, so we have to generalize it enough to work in all cases.

All of my CS education pertaining to trees makes assumptions that go against this program. I believe the answer lies not in traditional tree/graph theory but in some new angle, a new way of approaching the problem that flies in the face of conventional thought.

One parameter not described in the original problem is whether a copy of the worm has knowledge about which nodes are already infected. Does a worm infect its children until done, then quit? Does it know about nodes not connected to itself? E.g. if the root infects a node, which infects another node, does that node know about its "cousin" nodes? Does the algorithm take that into account? Is infection random? How much a priori knowledge do we have?

Title: Re: Worm Propagation
Post by towr on Sep 2nd, 2004, 11:40am

on 09/02/04 at 10:49:55, John_Gaughan wrote:
 That and the fact that we do not know the network layout ahead of time, so we have to generalize it enough to work in all cases.
The puzzle states we do know the layout of the network (not necessarily a tree) ahead of time.

Title: Re: Worm Propagation
Post by rmsgrey on Oct 24th, 2004, 10:03am
Looking back over some of the old problems...

I'm curious as to whether this problem can even be shown to be in NP - I'm a little rusty on the theory, but don't NP problems have to be able to evaluate prospective solutions in polynomial time? How would you go about recognising a genuine minimum time infection route? Certainly a naive brute force method requires comparison of all possible attacks to determine the optimum one.

Title: Re: Worm Propagation
Post by towr on Oct 24th, 2004, 1:47pm
I'm pretty sure it has to be NP. Because as it isn't polynomial, it's non-polynomial.. There's only two classes.
You might be thinking of NP-complete (a subclass of NP-problem)

Title: Re: Worm Propagation
Post by rmsgrey on Oct 25th, 2004, 6:21am
I was taught that NP was Non-deterministic Polynomial - in other words solvable by a non-deterministic device in polynomial time, unlike P which are solvable by deterministic devices in polynomial time, L which are solvable using logarithmic space, P-SPACE, NP-SPACE which are solvable using polynomial space by deterministic and non-deterministic devices respectively.

The simplest type of NP algorithm is one which non-deterministically generates a possible solution, and then tests it in polynomial time - the basic idea of a non-deterministic machine is that, whenever it hits a "choose" instruction, it makes all possible choices simultaneously, and then evaluates the outcome of each choice in parallel. The machine fails if all choices fail, and succeeds if any parallel version succeeds.

I guess you could create a machine that, for a given infection time, checks the non-deterministically generated (in time O(n2)) infection scheme in time O(n) to see if it takes the appropriate amount of time to infect the network. Having that machine, you could then create another machine that called it repeatedly with times 1 to n until a success gets returned. Assuming I haven't made any egregious errors, the total running time would be O(n3), putting the problem in NP.

I guess I've managed to answer my own question then...

Title: Re: Worm Propagation
Post by towr on Oct 25th, 2004, 8:23am

on 10/25/04 at 06:21:20, rmsgrey wrote:
 I was taught that NP was Non-deterministic Polynomial
hmm.. you're right (http://en.wikipedia.org/wiki/NP_%28complexity%29)
I must be more affected by my cold than I reallize.. At least I got the 'non' and 'polynomial' parts right ;)

Title: Re: Worm Propagation
Post by Aryabhatta on Oct 25th, 2004, 6:22pm
This problem seems to be the same as the Minimum Broadcast Time problem, which is claimed to be NP-Complete (no proof, but a reference) in the first few pages of the following paper (http://www.crab.rutgers.edu/~guyk/pub/broad/nabs.ps).

Now that we 'know' that it is NP-Complete, it would be interesting to come up with a proof of that.

Title: Re: Worm Propagation
Post by Aryabhatta on Oct 26th, 2004, 7:56pm
Supposedly, 3 dimensional matching is reducible in a straight forward way to the minimum broadcast problem.

3 Dimensional Matching: Given a set S of 3n elements and a family F of subsets of S, such that each element of F is of cardinality 3, does there exist a subset H of F of cardinality n such that the union of elements of H is S.

(basically given some m > n, triples from S, can you find n triples which cover S. Since S is of size 3n, those n triples must be mutually disjoint, it is also known as the Exact Three Cover.)

3 Dimensional Matching is known to be NP-Complete.

Title: Re: Worm Propagation
Post by Sjoerd Job Postmus on Mar 10th, 2005, 2:56pm
I'm an infected computer, I will take these steps:
1. Enumerate the computers I can infect.
2. Choose uninfected computer that only I can infect, with the largest amount of links.
3. If that fails - all computers are somehow attached to other infects - choose one with as many links as possible, with the least amount of infected links, with the largest total of infect busyness.
4. If that fails, I can no longer infect anything, so I'm done :(!

Title: Re: Worm Propagation
Post by DewiMorgan on Jun 2nd, 2006, 7:38pm
Although it's NP-hard, the fun comes from constructing an effective, nearoptimal algorithm for a blind virus to spread most economically.

Longest-path and center-of-mass and most-grandchildren solutions all provide good general answers, but fail in some situations.

Consider a situation where you are attacking "target", which lies between a hub, connected in parallel to X machines, none of which are interconnected, and to a chain of Y machines.

X=Hub-target-A-B-C-D-E-...-Y

Clearly, if X>Y, then it's better to infect the hub machine first, so it can start infecting its children as soon as possible. This is despite the fact that it's not the "center" of the network, and it's not the longest path.

If X<Y, then it's best to infect the chain first, even though the first link only has a single outside connection, because that link leads to the majority of the network, which we want to infect.

It rapidly becomes more complicated when you start linking the leaves together, of course.

Any solution should be recursive (that is, each infected child should be able to run it too). And unless the infected machines broadcast which others they infect, and the machines can deliberate amongst eachother about who should infect whom, I's assume all machines are working blind.

As much as possible, though, each machine needs to mark already-infected machines in its internal map, remove the links between them, remove any subsequently isolated machines, and repeat the process each second (after it had connected to all its own children, it would consider itself an isolated node, and remove itself from the network, stopping its own action).

But guessing which others (other than its parent and the ones it infected itself) were infected, could be tricky.

This is because I think randomness may need to come into play. The chance of it selecting a target node could be divided by the number of available nodes, and multiplied by the suspected number of non-infected links from that node, for instance.

Also want to include a factor to compensate for the probability of another known infected node selecting that item. Possibly subtract 1/(number of nodes available to infected node) from the probability of selecting that item, for each infected node that shares it, capping at just over zero, but not marking it as actually infected unless the probability was 1/1.

Then scale all probabilities so the total probability of selecting SOME node is 1.

This algorithm would mean that nodes right in the center of a mass of infected machines might not get infected until one of the infected machines had exhausted all other probabilities, but that's probably not a bad thing.

I think the above is pretty close to optimal, though it  would be suboptimal for the first node in the Y>X state above.

It would also fail sometimes for:

x-x-infected-y

...since it would sometimes choose y first. Probably the choices should only be random where there are conflicts with other infected hosts, and should be deterministic for unshared nodes.

which brings us back to how to best choose deterministically.

THe weighting should include a measure of the netsork distance from other infected notes (other than passing through the current host), too. And probably a measure of the size of the tree behind it divided by the number of known-infected hosts on that tree.

Title: Re: Worm Propagation
Post by Shiladie on Nov 26th, 2006, 6:10pm
Very simple, most probably flawed logic while looking at this

after confusing ever myself with bad math here is an example for 7 seconds with infinit computers available infecting the maximum computers

0: x
1: x, 1x1
2: x, 2x1,1x2
3: x, 3x1,3x2,1x3
4: x, 4x1 6x2,4x3,1x4
5: x, 5x1,10x2,10x3,5x4,1x5
6: x, 6x1,15x2,20x3,15x4,6x5,1x6
7: x, 7x1,21x2,35x3,35x4,21x5,7x6,1x7

after 7 seconds you infect 127 computers
and now its so obvious, the equation for what i am now sure is probably the completely wrong problem is:
s = seconds assumeing you start at 0
x = number of computers infected

2^s-1 = x

this is assumeing perfect network layout and that the virus is smart enough to not infect the same computer more then once. which bwe have to assume to get any sort of real answer for this riddle

without more information about how the network is layed out the rest is theorycraft about 'weighting the larger branches of the network more heavily' and such

and now im ready for the flame about how i'm wrong and missed some big note, which is probably true because this just is too simply to have so many people thinking...

Title: Re: Worm Propagation
Post by balakrishnan on Jan 10th, 2007, 8:04pm
Is this solved yet?

Title: Re: Worm Propagation
Post by towr on Jan 11th, 2007, 1:51am

on 01/10/07 at 20:04:42, balakrishnan wrote:
 Is this solved yet?
Depends on what you consider solved. To do it perfectly is NP hard, and the brute force approach to do that was posted early on.
However a near optimal solution may be better in practice (especially if you don't know the entire layout beforehand). And there's nearly always room for improvement at that end.

Title: Re: Worm Propagation
Post by CowsRUs on Feb 12th, 2007, 9:19am
I would infect other computers then infect more then infect more than start spamming emails and hope some idiot clicks on the spam with the virus

Title: Re: Worm Propagation
Post by bosssssss3 on Mar 21st, 2007, 10:02am

Title: Re: Worm Propagation
Post by Hippo on Apr 30th, 2007, 4:26am
Sorry I had loooked for on all pages missing the title one ... this belongs here...

I will show that minimal gosip problem (worm propagation) is NP hard.

I will show that deciding whether mingosip=k providing you know mingosip in k,k+1 is NP complete.

I will show a reduction from the SAT CNF problem.

You have m clauses using n different variables alltogether.

Let k be chosen later.

I will construct graph with mingosip in k,k+1 having mingosip = k iff given formula is satisfable:

Construction will be started from starting vertex 0 for the gosip problem.

Let vertex 0 and newly created vertices xi,neg xi creates triangle.

Let vertex i is connected to both xi and neg xi.

And let us create path of length k-i-2 edges starting in vertex i.

Last vertex of the path being blotted before time k+1 forces that vertex i is blotted at time i+1.

In that case either xi or neg xi is blotted in time i.

Furthermore for i-th literal vertex (xi or neg xi vertex) we create hypercube sceletton connected to it such that under each literal there is at least m leaves in the last level.

This is the only bound for k (and it is lower bound).

And the last level is blottable in time k-2 iff the literal vertex is selected and in time k
iff it is not selected and the last vertex of path under vertex i is reachable in time k.

For l-th clause create and connect vertices cl,dl and for each literal y connect cl to l-th leave of the y tree.

Vertex cl must be blotable in time k-1 to
be able to blot dl before time k.

...
WQD