wu :: forums (http://www.ocf.berkeley.edu/~wwu/cgi-bin/yabb/YaBB.cgi)
riddles >> cs >> Related Content
(Message started by: A on Sep 10th, 2013, 6:37am)

Title: Related Content
Post by A on Sep 10th, 2013, 6:37am
Suppose i do have billions of articles published over years how i can go about building related articles for every article available.

These articles can be totally independent or can have a timeline (history) / related.

Title: Re: Related Content
Post by towr on Sep 10th, 2013, 9:09am
The problem is not really clear. What sort of input do we have, what sort of output is desired?
Do we have an arbitrary number of 'A is related to B' and then have to create a transitive/associative closure of that relation?

Title: Re: Related Content
Post by A on Sep 11th, 2013, 6:06am
The input access to all the articles. each article has
- title
- content
- date of publishing

One of the methods i can think of is to extract the keywords from articles and then find the match using tf-idf .

The output i am looking for is, for each article the most relevant articles . (date/context)

Title: Re: Related Content
Post by towr on Sep 11th, 2013, 9:04am
Okay, so we have to figure out the relatedness ourselves.
Any specific sort of articles? i.e. scientific journal papers, or newspaper articles, definite articles? Or simply any sort of text of any length?

We could try to determine geographic relatedness by analyzing place names.
Bayesian classifiers could be used to sort the articles into categories, given some examples to start with.

Title: Re: Related Content
Post by yudivortasquetz on Oct 11th, 2013, 6:21pm
what does this mean for the related post this forum?

Title: Re: Related Content
Post by pandani on Oct 28th, 2013, 6:02pm
What kind of CMS you are using? Wordpress do have Plugins to show up your related articles.

Title: Re: Related Content
Post by jordan on Feb 2nd, 2014, 2:38am
Every article could have tags. So for article X you show similar articles having the same tags.

Tags could be written manually or somehow you could extract them from content, for example taking the most popular words from content (you should avoid words like "and", "or" etc.)

Title: Re: Related Content
Post by puzzlecracker on Feb 8th, 2014, 10:56am
Check out touchgraph navigator -  toughgraph.com. It visualizes  relational data based on  concepts.

Title: Re: Related Content
Post by gitanas on Jan 27th, 2016, 6:35am
What about using the word count?
You can link articles if they have many similar words.

Title: Re: Related Content
Post by towr on Jan 27th, 2016, 11:23am
Using term frequency-inverse document frequency (TFIDF) is a standard approach.
Or you could use doc2vec or similar algorithms to embed all document in an N-dimensional space where related documents simply lie close together.



Powered by YaBB 1 Gold - SP 1.4!
Forum software copyright © 2000-2004 Yet another Bulletin Board