wu :: forums
« wu :: forums - Related Content »

Welcome, Guest. Please Login or Register.
Mar 28th, 2024, 1:42pm

RIDDLES SITE WRITE MATH! Home Home Help Help Search Search Members Members Login Login Register Register
   wu :: forums
   riddles
   cs
(Moderators: william wu, Eigenray, SMQ, Grimbal, ThudnBlunder, towr, Icarus)
   Related Content
« Previous topic | Next topic »
Pages: 1  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print
   Author  Topic: Related Content  (Read 5042 times)
A
Full Member
***



Perder todas as esperanças é liberdade!

   


Gender: male
Posts: 236
Related Content  
« on: Sep 10th, 2013, 6:37am »
Quote Quote Modify Modify

Suppose i do have billions of articles published over years how i can go about building related articles for every article available.  
 
These articles can be totally independent or can have a timeline (history) / related.
IP Logged

What Doesn't Kill Me Will Only Make Me Stronger
towr
wu::riddles Moderator
Uberpuzzler
*****



Some people are average, some are just mean.

   


Gender: male
Posts: 13730
Re: Related Content  
« Reply #1 on: Sep 10th, 2013, 9:09am »
Quote Quote Modify Modify

The problem is not really clear. What sort of input do we have, what sort of output is desired?
Do we have an arbitrary number of 'A is related to B' and then have to create a transitive/associative closure of that relation?
IP Logged

Wikipedia, Google, Mathworld, Integer sequence DB
A
Full Member
***



Perder todas as esperanças é liberdade!

   


Gender: male
Posts: 236
Re: Related Content  
« Reply #2 on: Sep 11th, 2013, 6:06am »
Quote Quote Modify Modify

The input access to all the articles. each article has  
 - title  
 - content
 - date of publishing  
 
One of the methods i can think of is to extract the keywords from articles and then find the match using tf-idf .
 
The output i am looking for is, for each article the most relevant articles . (date/context)
IP Logged

What Doesn't Kill Me Will Only Make Me Stronger
towr
wu::riddles Moderator
Uberpuzzler
*****



Some people are average, some are just mean.

   


Gender: male
Posts: 13730
Re: Related Content  
« Reply #3 on: Sep 11th, 2013, 9:04am »
Quote Quote Modify Modify

Okay, so we have to figure out the relatedness ourselves.
Any specific sort of articles? i.e. scientific journal papers, or newspaper articles, definite articles? Or simply any sort of text of any length?
 
We could try to determine geographic relatedness by analyzing place names.  
Bayesian classifiers could be used to sort the articles into categories, given some examples to start with.
« Last Edit: Sep 11th, 2013, 9:06am by towr » IP Logged

Wikipedia, Google, Mathworld, Integer sequence DB
yudivortasquetz
Newbie
*





   


Posts: 2
Re: Related Content  
« Reply #4 on: Oct 11th, 2013, 6:21pm »
Quote Quote Modify Modify

what does this mean for the related post this forum?
IP Logged
pandani
Newbie
*





   


Gender: male
Posts: 28
Re: Related Content  
« Reply #5 on: Oct 28th, 2013, 6:02pm »
Quote Quote Modify Modify

What kind of CMS you are using? Wordpress do have Plugins to show up your related articles.
IP Logged
jordan
Junior Member
**





   


Gender: male
Posts: 63
Re: Related Content  
« Reply #6 on: Feb 2nd, 2014, 2:38am »
Quote Quote Modify Modify

Every article could have tags. So for article X you show similar articles having the same tags.
 
Tags could be written manually or somehow you could extract them from content, for example taking the most popular words from content (you should avoid words like "and", "or" etc.)
IP Logged

My personal fashion blog for hippie and free women Boho and Flower
puzzlecracker
Senior Riddler
****



Men have become the tools of their tools

   


Gender: male
Posts: 319
Re: Related Content  
« Reply #7 on: Feb 8th, 2014, 10:56am »
Quote Quote Modify Modify

Check out touchgraph navigator -  toughgraph.com. It visualizes  relational data based on  concepts.
« Last Edit: Feb 8th, 2014, 10:56am by puzzlecracker » IP Logged

While we are postponing, life speeds by
gitanas
Junior Member
**





   


Posts: 55
Re: Related Content  
« Reply #8 on: Jan 27th, 2016, 6:35am »
Quote Quote Modify Modify

What about using the word count?
You can link articles if they have many similar words.
IP Logged

Dummy Frog - my blog about interesting and funny things in our World
towr
wu::riddles Moderator
Uberpuzzler
*****



Some people are average, some are just mean.

   


Gender: male
Posts: 13730
Re: Related Content  
« Reply #9 on: Jan 27th, 2016, 11:23am »
Quote Quote Modify Modify

Using term frequency-inverse document frequency (TFIDF) is a standard approach.
Or you could use doc2vec or similar algorithms to embed all document in an N-dimensional space where related documents simply lie close together.
IP Logged

Wikipedia, Google, Mathworld, Integer sequence DB
Pages: 1  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print

« Previous topic | Next topic »

Powered by YaBB 1 Gold - SP 1.4!
Forum software copyright © 2000-2004 Yet another Bulletin Board