|
||
Title: Related Content Post by A on Sep 10th, 2013, 6:37am Suppose i do have billions of articles published over years how i can go about building related articles for every article available. These articles can be totally independent or can have a timeline (history) / related. |
||
Title: Re: Related Content Post by towr on Sep 10th, 2013, 9:09am The problem is not really clear. What sort of input do we have, what sort of output is desired? Do we have an arbitrary number of 'A is related to B' and then have to create a transitive/associative closure of that relation? |
||
Title: Re: Related Content Post by A on Sep 11th, 2013, 6:06am The input access to all the articles. each article has - title - content - date of publishing One of the methods i can think of is to extract the keywords from articles and then find the match using tf-idf . The output i am looking for is, for each article the most relevant articles . (date/context) |
||
Title: Re: Related Content Post by towr on Sep 11th, 2013, 9:04am Okay, so we have to figure out the relatedness ourselves. Any specific sort of articles? i.e. scientific journal papers, or newspaper articles, definite articles? Or simply any sort of text of any length? We could try to determine geographic relatedness by analyzing place names. Bayesian classifiers could be used to sort the articles into categories, given some examples to start with. |
||
Title: Re: Related Content Post by yudivortasquetz on Oct 11th, 2013, 6:21pm what does this mean for the related post this forum? |
||
Title: Re: Related Content Post by pandani on Oct 28th, 2013, 6:02pm What kind of CMS you are using? Wordpress do have Plugins to show up your related articles. |
||
Title: Re: Related Content Post by jordan on Feb 2nd, 2014, 2:38am Every article could have tags. So for article X you show similar articles having the same tags. Tags could be written manually or somehow you could extract them from content, for example taking the most popular words from content (you should avoid words like "and", "or" etc.) |
||
Title: Re: Related Content Post by puzzlecracker on Feb 8th, 2014, 10:56am Check out touchgraph navigator - toughgraph.com. It visualizes relational data based on concepts. |
||
Title: Re: Related Content Post by gitanas on Jan 27th, 2016, 6:35am What about using the word count? You can link articles if they have many similar words. |
||
Title: Re: Related Content Post by towr on Jan 27th, 2016, 11:23am Using term frequency-inverse document frequency (TFIDF) is a standard approach. Or you could use doc2vec or similar algorithms to embed all document in an N-dimensional space where related documents simply lie close together. |
||
Powered by YaBB 1 Gold - SP 1.4! Forum software copyright © 2000-2004 Yet another Bulletin Board |