 Related Content
Perder todas as esperanças é liberdade!

 « on: Sep 10th, 2013, 6:37am »

Suppose i do have billions of articles published over years how i can go about building related articles for every article available.

These articles can be totally independent or can have a timeline (history) / related.
Some people are average, some are just mean.

 « Reply #1 on: Sep 10th, 2013, 9:09am »

The problem is not really clear. What sort of input do we have, what sort of output is desired?
Do we have an arbitrary number of 'A is related to B' and then have to create a transitive/associative closure of that relation?
Perder todas as esperanças é liberdade!

 « Reply #2 on: Sep 11th, 2013, 6:06am »

- title
- content
- date of publishing

One of the methods i can think of is to extract the keywords from articles and then find the match using tf-idf .

The output i am looking for is, for each article the most relevant articles . (date/context)
Some people are average, some are just mean.

 « Reply #3 on: Sep 11th, 2013, 9:04am »

Okay, so we have to figure out the relatedness ourselves.
Any specific sort of articles? i.e. scientific journal papers, or newspaper articles, definite articles? Or simply any sort of text of any length?

We could try to determine geographic relatedness by analyzing place names.
Bayesian classifiers could be used to sort the articles into categories, given some examples to start with.
 « Reply #4 on: Oct 11th, 2013, 6:21pm »

what does this mean for the related post this forum?
 « Reply #5 on: Oct 28th, 2013, 6:02pm »

What kind of CMS you are using? Wordpress do have Plugins to show up your related articles.
 « Reply #6 on: Feb 2nd, 2014, 2:38am »

Every article could have tags. So for article X you show similar articles having the same tags.

Tags could be written manually or somehow you could extract them from content, for example taking the most popular words from content (you should avoid words like "and", "or" etc.)
Men have become the tools of their tools

 « Reply #7 on: Feb 8th, 2014, 10:56am »

Check out touchgraph navigator -  toughgraph.com. It visualizes  relational data based on  concepts.
 « Reply #8 on: Jan 27th, 2016, 6:35am »

What about using the word count?
You can link articles if they have many similar words.
Some people are average, some are just mean.

 « Reply #9 on: Jan 27th, 2016, 11:23am »

Using term frequency-inverse document frequency (TFIDF) is a standard approach.
Or you could use doc2vec or similar algorithms to embed all document in an N-dimensional space where related documents simply lie close together.
