wu :: forums « wu :: forums - Related Content » Welcome, Guest. Please Login or Register. May 19th, 2024, 10:28am RIDDLES SITE WRITE MATH! Home Help Search Members Login Register
 wu :: forums    riddles    cs (Moderators: Icarus, Eigenray, Grimbal, ThudnBlunder, SMQ, towr, william wu)    Related Content « Previous topic | Next topic »
 Pages: 1 Reply Notify of replies Send Topic Print
 Author Topic: Related Content  (Read 5048 times)
A
Full Member

Perder todas as esperanças é liberdade!

Gender:
Posts: 236
 Related Content   « on: Sep 10th, 2013, 6:37am » Quote Modify

Suppose i do have billions of articles published over years how i can go about building related articles for every article available.

These articles can be totally independent or can have a timeline (history) / related.
 IP Logged

What Doesn't Kill Me Will Only Make Me Stronger
towr
wu::riddles Moderator
Uberpuzzler

Some people are average, some are just mean.

Gender:
Posts: 13730
 Re: Related Content   « Reply #1 on: Sep 10th, 2013, 9:09am » Quote Modify

The problem is not really clear. What sort of input do we have, what sort of output is desired?
Do we have an arbitrary number of 'A is related to B' and then have to create a transitive/associative closure of that relation?
 IP Logged

Wikipedia, Google, Mathworld, Integer sequence DB
A
Full Member

Perder todas as esperanças é liberdade!

Gender:
Posts: 236
 Re: Related Content   « Reply #2 on: Sep 11th, 2013, 6:06am » Quote Modify

- title
- content
- date of publishing

One of the methods i can think of is to extract the keywords from articles and then find the match using tf-idf .

The output i am looking for is, for each article the most relevant articles . (date/context)
 IP Logged

What Doesn't Kill Me Will Only Make Me Stronger
towr
wu::riddles Moderator
Uberpuzzler

Some people are average, some are just mean.

Gender:
Posts: 13730
 Re: Related Content   « Reply #3 on: Sep 11th, 2013, 9:04am » Quote Modify

Okay, so we have to figure out the relatedness ourselves.
Any specific sort of articles? i.e. scientific journal papers, or newspaper articles, definite articles? Or simply any sort of text of any length?

We could try to determine geographic relatedness by analyzing place names.
Bayesian classifiers could be used to sort the articles into categories, given some examples to start with.
 « Last Edit: Sep 11th, 2013, 9:06am by towr » IP Logged

Wikipedia, Google, Mathworld, Integer sequence DB
yudivortasquetz
Newbie

Posts: 2
 Re: Related Content   « Reply #4 on: Oct 11th, 2013, 6:21pm » Quote Modify

what does this mean for the related post this forum?
 IP Logged
pandani
Newbie

Gender:
Posts: 28
 Re: Related Content   « Reply #5 on: Oct 28th, 2013, 6:02pm » Quote Modify

What kind of CMS you are using? Wordpress do have Plugins to show up your related articles.
 IP Logged
jordan
Junior Member

Gender:
Posts: 63
 Re: Related Content   « Reply #6 on: Feb 2nd, 2014, 2:38am » Quote Modify

Every article could have tags. So for article X you show similar articles having the same tags.

Tags could be written manually or somehow you could extract them from content, for example taking the most popular words from content (you should avoid words like "and", "or" etc.)
 IP Logged

My personal fashion blog for hippie and free women Boho and Flower
puzzlecracker
Senior Riddler

Men have become the tools of their tools

Gender:
Posts: 319
 Re: Related Content   « Reply #7 on: Feb 8th, 2014, 10:56am » Quote Modify

Check out touchgraph navigator -  toughgraph.com. It visualizes  relational data based on  concepts.
 « Last Edit: Feb 8th, 2014, 10:56am by puzzlecracker » IP Logged

While we are postponing, life speeds by
gitanas
Junior Member

Posts: 55
 Re: Related Content   « Reply #8 on: Jan 27th, 2016, 6:35am » Quote Modify

What about using the word count?
You can link articles if they have many similar words.
 IP Logged

Dummy Frog - my blog about interesting and funny things in our World
towr
wu::riddles Moderator
Uberpuzzler

Some people are average, some are just mean.

Gender:
Posts: 13730
 Re: Related Content   « Reply #9 on: Jan 27th, 2016, 11:23am » Quote Modify

Using term frequency-inverse document frequency (TFIDF) is a standard approach.
Or you could use doc2vec or similar algorithms to embed all document in an N-dimensional space where related documents simply lie close together.
 IP Logged

Wikipedia, Google, Mathworld, Integer sequence DB
 Pages: 1 Reply Notify of replies Send Topic Print

 Forum Jump: ----------------------------- riddles -----------------------------  - easy   - medium   - hard   - what am i   - what happened   - microsoft => cs   - putnam exam (pure math)   - suggestions, help, and FAQ   - general problem-solving / chatting / whatever ----------------------------- general -----------------------------  - guestbook   - truth   - complex analysis   - wanted   - psychology   - chinese « Previous topic | Next topic »