wu :: forums - Print Page


    
      
        wu :: forums
        (http://www.ocf.berkeley.edu/~wwu/cgi-bin/yabb/YaBB.cgi)
      

        riddles >> cs >> Related Content
        
(Message started by: A on Sep 10^th, 2013, 6:37am)

Title: Related Content
Post by A on Sep 10^th, 2013, 6:37am

Suppose i do have billions of articles published over years how i can go about building related articles for every article available.

These articles can be totally independent or can have a timeline (history) / related.

Title: Re: Related Content
Post by towr on Sep 10^th, 2013, 9:09am

The problem is not really clear. What sort of input do we have, what sort of output is desired?
Do we have an arbitrary number of 'A is related to B' and then have to create a transitive/associative closure of that relation?

Title: Re: Related Content
Post by A on Sep 11^th, 2013, 6:06am

The input access to all the articles. each article has
- title
- content
- date of publishing

One of the methods i can think of is to extract the keywords from articles and then find the match using tf-idf .

The output i am looking for is, for each article the most relevant articles . (date/context)

Title: Re: Related Content
Post by towr on Sep 11^th, 2013, 9:04am

Okay, so we have to figure out the relatedness ourselves.
Any specific sort of articles? i.e. scientific journal papers, or newspaper articles, definite articles? Or simply any sort of text of any length?

We could try to determine geographic relatedness by analyzing place names.
Bayesian classifiers could be used to sort the articles into categories, given some examples to start with.

Title: Re: Related Content
Post by yudivortasquetz on Oct 11^th, 2013, 6:21pm

what does this mean for the related post this forum?

Title: Re: Related Content
Post by pandani on Oct 28^th, 2013, 6:02pm

What kind of CMS you are using? Wordpress do have Plugins to show up your related articles.

Title: Re: Related Content
Post by jordan on Feb 2^nd, 2014, 2:38am

Every article could have tags. So for article X you show similar articles having the same tags.

Tags could be written manually or somehow you could extract them from content, for example taking the most popular words from content (you should avoid words like "and", "or" etc.)

Title: Re: Related Content
Post by puzzlecracker on Feb 8^th, 2014, 10:56am

Check out touchgraph navigator - toughgraph.com. It visualizes relational data based on concepts.

Title: Re: Related Content
Post by gitanas on Jan 27^th, 2016, 6:35am

What about using the word count?
You can link articles if they have many similar words.

Title: Re: Related Content
Post by towr on Jan 27^th, 2016, 11:23am

Using term frequency-inverse document frequency (TFIDF) is a standard approach.
Or you could use doc2vec or similar algorithms to embed all document in an N-dimensional space where related documents simply lie close together.