Author |
Topic: similar pages implementation. (Read 949 times) |
|
puzzlecracker
Senior Riddler
Men have become the tools of their tools
Gender:
Posts: 319
|
|
similar pages implementation.
« on: Jan 30th, 2005, 10:43am » |
Quote Modify
|
anyone has any ideas how similar pages implemented by Google? any suggestion you might have in mind?
|
|
IP Logged |
While we are postponing, life speeds by
|
|
|
towr
wu::riddles Moderator Uberpuzzler
Some people are average, some are just mean.
Gender:
Posts: 13730
|
|
Re: similar pages implementation.
« Reply #1 on: Jan 30th, 2005, 10:49am » |
Quote Modify
|
You could look at author, keywords, images (some websites do simply copy each others images, but I suppose it's not that frequent top be usefull). And I suppose, most notably, linkage. If there are a lot of pages that link to the same two pages, those two are probably similar/related.
|
|
IP Logged |
Wikipedia, Google, Mathworld, Integer sequence DB
|
|
|
Grimbal
wu::riddles Moderator Uberpuzzler
Gender:
Posts: 7527
|
|
Re: similar pages implementation.
« Reply #2 on: Jan 30th, 2005, 4:31pm » |
Quote Modify
|
I think Google simply gets the keywords it knows for the reference page and looks for other pages with the same keywords. Rare keywords are probably valued more or even much more. Linking to the same pages also would indicate similarity.
|
|
IP Logged |
|
|
|
puzzlecracker
Senior Riddler
Men have become the tools of their tools
Gender:
Posts: 319
|
|
Re: similar pages implementation.
« Reply #3 on: Jan 30th, 2005, 7:58pm » |
Quote Modify
|
I want to extend the towrs idea. They way it might be implemented is by comparing in-links to and out-links from the page, for similar pages 'usually' (should probably use a more mathematical terminology) same sites point into them, similarly - they have comparable links. any thoughts?
|
|
IP Logged |
While we are postponing, life speeds by
|
|
|
eviltoylet
Guest
|
|
Re: similar pages implementation.
« Reply #4 on: Jan 31st, 2005, 12:46am » |
Quote Modify
Remove
|
This is a pretty interesting question. I want to say that google spiders the web -- upon arriving at some arbitrary web page X, it records all links on that page . Then, it assumes that these pages could be related to each other. It spiders these web pages and extracts keywords or meta data even ... and if similar, keys them as similar. Perhaps a way to find out for sure is for us to make a few websites and link them ... with different keywords, or with same keywords.
|
|
IP Logged |
|
|
|
towr
wu::riddles Moderator Uberpuzzler
Some people are average, some are just mean.
Gender:
Posts: 13730
|
|
Re: similar pages implementation.
« Reply #5 on: Jan 31st, 2005, 1:08am » |
Quote Modify
|
Google also looks at the text pages are link to each other with. That's how googlebombing works. So if the linktext for two pages is the same, they are probably also related. So math and more math probably makes google think these pages similar. (And of cousre it helps I'm linking to them from the same page, and in proximity to eachother on this page)
|
|
IP Logged |
Wikipedia, Google, Mathworld, Integer sequence DB
|
|
|
Terps.Go
Newbie
Gender:
Posts: 13
|
|
Re: similar pages implementation.
« Reply #7 on: Feb 3rd, 2005, 5:22pm » |
Quote Modify
|
Hehe, Dr. Broder when worked for Ditigal (acquired by Compaq) developed a method to compare two webpages using randomized algorithm. The method is called min-wise independent. This method is used in Altavista and I think google too. Try to google min-wise independent.
|
|
IP Logged |
|
|
|
|