wu :: forums « wu :: forums - Information derived from a data set » Welcome, Guest. Please Login or Register. Feb 23rd, 2024, 4:53am RIDDLES SITE WRITE MATH! Home Help Search Members Login Register
 wu :: forums    riddles    putnam exam (pure math) (Moderators: Eigenray, Grimbal, SMQ, towr, Icarus, william wu)    Information derived from a data set « Previous topic | Next topic »
 Pages: 1 Reply Notify of replies Send Topic Print
 Author Topic: Information derived from a data set  (Read 2574 times)
davkrs
Newbie

Posts: 6
 Information derived from a data set   « on: Dec 26th, 2013, 1:25pm » Quote Modify

In this blog post, http://www.evanmiller.org/small-data.html, the author claims that amount of information that can be derived from a data set is proportional to square root of the size of the data set (in the best case).

Is this true. Is there a proof behind this claim. Did you ever notice in your research/job/work/life. What are your thoughts

P.S. Though this question, and blog post linked to this question, are more applied math than pure math, I posted it here because I wasn't sure which category this applied math question should be posted in. Let me know if this question should be moved to a different category
 « Last Edit: Dec 26th, 2013, 1:28pm by davkrs » IP Logged
towr
wu::riddles Moderator
Uberpuzzler

Some people are average, some are just mean.

Gender:
Posts: 13730
 Re: Information derived from a data set   « Reply #1 on: Dec 26th, 2013, 10:44pm » Quote Modify

He seems to be speaking from the viewpoint of regression analysis only.
From basic statistics we also know that e.g. the estimation of the variance (i.e. how certain we are about the true value of a statistical variable) is inversely proportional to the square-root of the size of the data (for approximately normally distributed data). So you get less additional certainty for every extra amount of data.
However, if you want to analyze rare events, then you need to filter a huge amount of data just to find enough of them to be able to say anything about them. And therefore you need to collect a huge amount of data (especially if you don't know beforehand what rare events you want to analyze -- which brings it's own troubles, because you can always find something on a finishing expedition, even if it turns out to be an old shoe). And as he says, if you deal with a lot of parameters, you need a lot of data to estimate them all. And some data is so noisy, you need a lot to average out the noise; it's often not neatly normally distributed data.
 IP Logged

Wikipedia, Google, Mathworld, Integer sequence DB
 Pages: 1 Reply Notify of replies Send Topic Print

 Forum Jump: ----------------------------- riddles -----------------------------  - easy   - medium   - hard   - what am i   - what happened   - microsoft   - cs => putnam exam (pure math)   - suggestions, help, and FAQ   - general problem-solving / chatting / whatever ----------------------------- general -----------------------------  - guestbook   - truth   - complex analysis   - wanted   - psychology   - chinese « Previous topic | Next topic »