Author |
Topic: Related Content (Read 5059 times) |
|
A
Full Member
Perder todas as esperanças é liberdade!
Gender:
Posts: 236
|
|
Related Content
« on: Sep 10th, 2013, 6:37am » |
Quote Modify
|
Suppose i do have billions of articles published over years how i can go about building related articles for every article available. These articles can be totally independent or can have a timeline (history) / related.
|
|
IP Logged |
What Doesn't Kill Me Will Only Make Me Stronger
|
|
|
towr
wu::riddles Moderator Uberpuzzler
Some people are average, some are just mean.
Gender:
Posts: 13730
|
|
Re: Related Content
« Reply #1 on: Sep 10th, 2013, 9:09am » |
Quote Modify
|
The problem is not really clear. What sort of input do we have, what sort of output is desired? Do we have an arbitrary number of 'A is related to B' and then have to create a transitive/associative closure of that relation?
|
|
IP Logged |
Wikipedia, Google, Mathworld, Integer sequence DB
|
|
|
A
Full Member
Perder todas as esperanças é liberdade!
Gender:
Posts: 236
|
|
Re: Related Content
« Reply #2 on: Sep 11th, 2013, 6:06am » |
Quote Modify
|
The input access to all the articles. each article has - title - content - date of publishing One of the methods i can think of is to extract the keywords from articles and then find the match using tf-idf . The output i am looking for is, for each article the most relevant articles . (date/context)
|
|
IP Logged |
What Doesn't Kill Me Will Only Make Me Stronger
|
|
|
towr
wu::riddles Moderator Uberpuzzler
Some people are average, some are just mean.
Gender:
Posts: 13730
|
|
Re: Related Content
« Reply #3 on: Sep 11th, 2013, 9:04am » |
Quote Modify
|
Okay, so we have to figure out the relatedness ourselves. Any specific sort of articles? i.e. scientific journal papers, or newspaper articles, definite articles? Or simply any sort of text of any length? We could try to determine geographic relatedness by analyzing place names. Bayesian classifiers could be used to sort the articles into categories, given some examples to start with.
|
« Last Edit: Sep 11th, 2013, 9:06am by towr » |
IP Logged |
Wikipedia, Google, Mathworld, Integer sequence DB
|
|
|
yudivortasquetz
Newbie
Posts: 2
|
|
Re: Related Content
« Reply #4 on: Oct 11th, 2013, 6:21pm » |
Quote Modify
|
what does this mean for the related post this forum?
|
|
IP Logged |
|
|
|
pandani
Newbie
Gender:
Posts: 28
|
|
Re: Related Content
« Reply #5 on: Oct 28th, 2013, 6:02pm » |
Quote Modify
|
What kind of CMS you are using? Wordpress do have Plugins to show up your related articles.
|
|
IP Logged |
|
|
|
jordan
Junior Member
Gender:
Posts: 63
|
|
Re: Related Content
« Reply #6 on: Feb 2nd, 2014, 2:38am » |
Quote Modify
|
Every article could have tags. So for article X you show similar articles having the same tags. Tags could be written manually or somehow you could extract them from content, for example taking the most popular words from content (you should avoid words like "and", "or" etc.)
|
|
IP Logged |
My personal fashion blog for hippie and free women Boho and Flower
|
|
|
puzzlecracker
Senior Riddler
Men have become the tools of their tools
Gender:
Posts: 319
|
|
Re: Related Content
« Reply #7 on: Feb 8th, 2014, 10:56am » |
Quote Modify
|
Check out touchgraph navigator - toughgraph.com. It visualizes relational data based on concepts.
|
« Last Edit: Feb 8th, 2014, 10:56am by puzzlecracker » |
IP Logged |
While we are postponing, life speeds by
|
|
|
gitanas
Junior Member
Posts: 55
|
|
Re: Related Content
« Reply #8 on: Jan 27th, 2016, 6:35am » |
Quote Modify
|
What about using the word count? You can link articles if they have many similar words.
|
|
IP Logged |
Dummy Frog - my blog about interesting and funny things in our World
|
|
|
towr
wu::riddles Moderator Uberpuzzler
Some people are average, some are just mean.
Gender:
Posts: 13730
|
|
Re: Related Content
« Reply #9 on: Jan 27th, 2016, 11:23am » |
Quote Modify
|
Using term frequency-inverse document frequency (TFIDF) is a standard approach. Or you could use doc2vec or similar algorithms to embed all document in an N-dimensional space where related documents simply lie close together.
|
|
IP Logged |
Wikipedia, Google, Mathworld, Integer sequence DB
|
|
|
|