wu :: forums - similar pages implementation.

wu :: forums « wu :: forums - similar pages implementation. » Welcome, Guest. Please Login or Register. May 4^th, 2024, 4:15am
RIDDLES SITE WRITE MATH! Home Help Search Members Login Register

   wu :: forums
   riddles
   cs (Moderators: SMQ, towr, Grimbal, Eigenray, ThudnBlunder, Icarus, william wu)
   similar pages implementation.

« Previous topic | Next topic »

Pages: 1

Notify of replies

Send Topic

Author

Topic: similar pages implementation. (Read 949 times)

puzzlecracker
Senior Riddler

Men have become the tools of their tools

Gender: male

Posts: 319

similar pages implementation.
« on: Jan 30^th, 2005, 10:43am »

Quote

Modify

anyone has any ideas how similar pages implemented by Google? any suggestion you might have in mind?

IP Logged

While we are postponing, life speeds by

towr
wu::riddles Moderator
Uberpuzzler

Some people are average, some are just mean.

Gender: male

Posts: 13730

Re: similar pages implementation.
« Reply #1 on: Jan 30^th, 2005, 10:49am »

Quote

Modify

You could look at author, keywords, images (some websites do simply copy each others images, but I suppose it's not that frequent top be usefull).
And I suppose, most notably, linkage. If there are a lot of pages that link to the same two pages, those two are probably similar/related.

IP Logged

Wikipedia, Google, Mathworld, Integer sequence DB

Grimbal
wu::riddles Moderator
Uberpuzzler

Gender:

Posts: 7527

Re: similar pages implementation.
« Reply #2 on: Jan 30^th, 2005, 4:31pm »

Quote

Modify

I think Google simply gets the keywords it knows for the reference page and looks for other pages with the same keywords. Rare keywords are probably valued more or even much more. Linking to the same pages also would indicate similarity.

IP Logged

puzzlecracker
Senior Riddler

Men have become the tools of their tools

Gender: male

Posts: 319

Re: similar pages implementation.
« Reply #3 on: Jan 30^th, 2005, 7:58pm »

Quote

Modify

I want to extend the towrs idea. They way it might be implemented is by comparing in-links to and out-links from the page, for similar pages 'usually' (should probably use a more mathematical terminology) same sites point into them, similarly - they have comparable links.

any thoughts?

IP Logged

While we are postponing, life speeds by

eviltoylet
Guest

Re: similar pages implementation.
« Reply #4 on: Jan 31^st, 2005, 12:46am »

Quote

Modify

Remove

This is a pretty interesting question. I want to say that google spiders the web -- upon arriving at some arbitrary web page X, it records all links on that page . Then, it assumes that these pages could be related to each other. It spiders these web pages and extracts keywords or meta data even ... and if similar, keys them as similar.

Perhaps a way to find out for sure is for us to make a few websites and link them ... with different keywords, or with same keywords.

IP Logged

towr
wu::riddles Moderator
Uberpuzzler

Some people are average, some are just mean.

Gender: male

Posts: 13730

Re: similar pages implementation.
« Reply #5 on: Jan 31^st, 2005, 1:08am »

Quote

Modify

Google also looks at the text pages are link to each other with. That's how googlebombing works.
So if the linktext for two pages is the same, they are probably also related.
So math and more math probably makes google think these pages similar. (And of cousre it helps I'm linking to them from the same page, and in proximity to eachother on this page)

IP Logged

Wikipedia, Google, Mathworld, Integer sequence DB

amichail
Senior Riddler

Posts: 450

Re: similar pages implementation.
« Reply #6 on: Jan 31^st, 2005, 1:51am »

Quote

Modify

I think this paper on SimRank goes further than what Google does:

http://ideas.web.cse.unsw.edu.au/index.php?module=articles&func=disp lay&ptid=1&aid=20

My guess is that Google uses a simple cocitation algorithm. SimRank takes this idea further in much the same way that PageRank takes an indegree link count further.

« Last Edit: Jan 31^st, 2005, 1:53am by amichail »

IP Logged

DropZap - a new kind of block elimination game

Terps.Go
Newbie

Gender:

Posts: 13

Re: similar pages implementation.
« Reply #7 on: Feb 3^rd, 2005, 5:22pm »

Quote

Modify

Hehe,
Dr. Broder when worked for Ditigal (acquired by Compaq) developed a method to compare two webpages using randomized algorithm. The method is called min-wise independent. This method is used in Altavista and I think google too. Try to google min-wise independent.

IP Logged

Pages: 1

Notify of replies

Send Topic


« Previous topic \| Next topic »