您的位置:站长主页 -> 繁星客栈 -> 图灵塔 (应用技术论坛) -> 如何系统地学好Machine Learning September 2, 2010

如何系统地学好Machine Learning

用户登陆 | 刷新

lowiser

发表文章数: 11
武功等级: 野球拳
     (第二重)
内力值: 83/83

如何系统地学好Machine Learning



我学了半年了,感觉还是有点摸不透,Machine Learning所需要方方面面的只是,包括Statistics, Probability, Calculus, Graph Theory...东西太多了,我很多时候看到一个地方就觉得没能吃透,然后去参考别的文献,可惜太花时间而且总觉得永远也学不透,很苦恼啊,谁能指导我一下?谢谢。


发表时间:2006-01-01, 13:34:49  作者资料

lowiser

发表文章数: 11
武功等级: 野球拳
     (第二重)
内力值: 83/83

Re: 如何系统地学好Machine Learning



Who can answer my question? Thanks...


I am a Computer Science Postgraduate. From National University of Singapore. Singapore-MIT Alliance. http://wing.comp.nus.edu.sg/~luwei/


发表时间:2006-01-06, 02:33:02  作者资料

kanex

发表文章数: 860
武功等级: 弹指神通
     (第六重)
内力值: 343/343

Re: 如何系统地学好Machine Learning



学。


江畔何人初见月`江月何年初照人`


发表时间:2006-01-06, 18:48:53  作者资料

Omni

发表文章数: 305
武功等级: 太极剑法
     (第五重)
内力值: 374/374

Re: 如何系统地学好Machine Learning



I really hesitated when I first saw your question last week, you asked a very broad question. If I chose to answer, first it might take too much time to write, second I'm very far from being an expert in this field, third I still view myself as a beginner in studying machine learning. So I thought somebody else in this forum might be able to help you. But now that you are so desperate in seeking help and I seem to be the only person who might give you a minor point to the right direction. Plus it's Friday night, time to relax and write something for fun, so I dare myself help a young student.

Let me first claim that my training background is biochemistry and cell biology, after deciding to practice computational biology after Ph.D., I started to pick up computer science during my graduate school years (from late 1997) and statistics during my first job right after graduate school (from early 2001). So I have no "license" in either CS or statistics, you should read my comments from a critique point of view. You may need some patience to read this post as I set up the stage slowly before jumping into machine learning.

That being said, let me start with general comments on the major difference between statistical thinking and mathematical thinking. I view statistics and probability as two most critical pillars of machine learning or even artificial intelligence. I assume most participants in this forum are physicists, mathematicians, or mathematical physicists. They feel very attracted by the purist and beauty of mathematical thinking, it's pretty much following the spirit of Newton's Principia. I was browsing the heavy book "On the Shoulders of Giants" edited by Stephen Hawking last weekend, it's a wonderful collection of the ORIGINAL works (all translated into English) of 5 giants in the history of physics and astronomy --- Copernicus, Galileo, Kepler, Newton, and Einstein. One has to be in awe when reading this heavy volume, I know for sure that I can never afford the time and effort to dig into it, this collection is the quintessence of human rational civilization after Euclid's Elements. Newton was able to describe the mechanical world in the spirit of Euclid, purely mathematical thinking --- starting from definitions and axioms to derive theorems that can describe the laws of nature. That spirit inspires generations after generations of scientists.

But once we go beyond physics, we enter the realms of chemistry and life science, where the mere complexity overwhelms our ability to strip things down into a limited number of principles. That's why statistics was invented in late 19th century. Although physicists had "statistical physics" for a long time, I think that's more about probability theory than statistical inference in modern spirit (feel free to correct me if I'm wrong here). Modern statistics really started from Galton and Karl Pearson, followed by the great Ronald Fisher, all of them can also be viewed as biologists. That's why the most prestigious statistics journal until today has been "biometrika"! "Biometrics" is the old name for biostatistics, so I think modern statistics is more inspired by quantitative biology (esp. genetics and evolution) than by physics.

How does statistical thinking (as inspired by Fisher) differ from mathematical thinking (as insipired by Euclid and Newton)? What is the role of mathematics in statistics? If you purge statistics of its mathematical content, what intellectual substance remains?

Statistics is a methodological discipline. It exists not for itself but rather to offer other fields of science a coherent set of ideas and tools for dealing with data. The need for such a discipline arises from the omnipresence of variability ("omni" shows up again, hehe). Individuals vary, repeated measurements on the same individual vary. Statistics provides means for dealing with data that take into account the omnipresence of variability.

Statistics requires a different KIND of thinking, because data are not just numbers, they are numbers with a CONTEXT. Although mathematicians often rely on applied context both for motivation and as a source of problems for research (e.g., the always tight-knit relationship between physics and mathematics since Newton's days), the ultimate focus in mathematical thinking is on abstract patterns --- the context is part of the irrelevant detail that must be boiled off over the flame of abstraction in order to reveal the previously hidden crystal of pure structure.

The punch line is --- In mathematics, context obscures structure. In statistics (or data analysis), context provides meaning! This fundamental difference has profound implications in learning statistics. In order to master statistics (inlcuding machine learning), it's not enough to understand the mathematical theory, it is not even enough to understand also the additional, non-mathematical theory of statistics. These comments may mystify statistics to some extent, but there is no reason for us to be afraid.

I'm a big fan of John Tukey, why? Because he is the father of the modern concept of exploratory data analysis (EDA). Statistical ideas for producing data to answer specific questions are the most influential contributions of statistics to human knowledge. Data analysis is the contemporary form of "descriptive statistics", and Tukey's EDA philosophy is to let the data speaks for itself. We temporarily put aside the issue of whether these data represent any larger universe.

Since I assume you are an undergraduate student in the computer science arena, you may wonder why the hell do I need to care so much about statistics. The answer is simple, we are too far from understanding the fundamental principles of chemistry or biology, we are always in the state of incomplete knowledge of the structures. We have to rely on the language of probability and the method of statistics in order for a machine to simulate the learning process of a human brain. Machine learning is pretty much statistical learning unless someday we grasp the major modus operandi of human brain. Don't be misled by the example of Deep Blue beating Kasparov, acceptable chess moves form a finite space, computer algorithms can exhaustively search for the best move given a certain position. So computer chess is not really a machine learning problem. Weiqi might be a quite different story, dolphin might have some comments on this topic, I'll defer to his expertise.

So I'm not surprised to see a smart undergraduate student struggle when first taking a machine learning course. Most machine learning textbooks are terrible, for example, the one that has legal photocopy sold in China is Tom Mitchell's "Machine Learning" (1997), you'll pretty much learn nothing from this textbook except an empty framework. Same criticism can be applied to an overrated bioinformatics textbook in my field --- Baldi & Brunak's "Bioinformatics --- The Machine Learning Approach". This book is still useful to give people a framework of bioinformatics, but there is almost no substance inside this structure, it's a like a beautiful "castle in the air". This again emphasizes the fact that it's not enough to study machine learning or statistics from a mathematical point of view.

Based on my personal experience and a brief scan of your CV from your homepage, here are some suggestions for your study ---

(1) You need to go beyond computer programming or even algorithmics to study machine learning. You need to find an engineering project to motivate you and to provide you a context to cut into the field. I had a easier case than yours, because bioinformatics and genomics were clearly my context when I started to study machine learning. You need a well defined "application domain" or "research field" for your statistical or machine learning techniques to serve, keep in mind that "machine learning" does NOT exist for itself, it's a TOOL for other disciplines!

(2) Be careful with the textbooks you read, don't waste time on those "common" ones with "machine learning" in their titles (might be an "overkill" comments, but you should get the spirit of my message). The best textbook I've seen so far is: Hastie, Tibshirani, & Friedman (2001) "The Elements of Statistical Learning". These 3 Stanford professors did a decent job in teaching basic principles of data mining.

(3) Even with the right textbook, machine learning can only grasped through hands-on research, just reading and doing exercises are not enough. Also, machine learning is too broad a field for you to be greedy in studying. I'm not a fan of artifical neural nets (ANN), so I recommend you to ignore it as a beginner unless your research project needs ANN.

写到这里发现用几句中文更能达意:我个人体会学习统计学用“以点带面”法最能节省时间,上来目标要定得小,由一点切入后先向纵深发展,然后再求横向突破。兄弟当年在2001年选修的第一门统计课是很多人觉得“太简单”的回归分析(Regression Analysis),但我当时是根据工作需要出发,在学习过程中惊喜地发现可以“边走边打”逐渐补强需要的而没正式学过的概率论内容,再“打通奇经八脉”向多元统计学的其它方面渗透(这里幸亏当年在本科毕业后出国前工作的两年时间里补上了复旦没教的<线性代数>(生物系的课程设置真是“令人发指”,远远不如科大生物系),否则学多元统计之前要补更多漏洞)。等到正式修多元统计学课程时就比较轻松。

(4) For machine learning, I recommend two major "break-in" points. The first is my favorite --- hidden Markov models (HMM), the success of HMM in speech recognition inspired its use in bioinformatics from 1992. Amazingly, the seemingly oversimplified first-order HMM works very well for several biological problems, esp. gene finding and protein family analysis. So I always tell other people, as a beginner of machine learning, go for HMM and ignore ANN!

The second break-in point could be very challenging --- graphical models esp. Bayesian Belief Networks (or simply Bayesian Networks, BN). You need a solid foundation in graph theory and probability theory, because graphical modeling is at the intersection of the two mathematical branches. My understanding of graph theory is reasonably well, but I have no confidence in my probability theory background, that's why I didn't choose this break-in point. But recently I found BN is quite interesting and has a lot of applications in modeling bionetworks. I highly recommend David Heckerman's tutorial to your reading:

http://research.microsoft.com/research/pubs/view.aspx?msr_tr_id=MSR-TR-95-06

I believe after you are able to find a research project to provide the context of learning HMM or BN, you may find machine learning is no longer as daunting as you once felt. Finally let me give you some food for thought with regard to the relationship between probability and statistics, these comments may also be of some interest to physicists.

Probability is an elegant and powerful field of mathematics that enriches the subject as a whole through its interactions with other fields of mathematics. The domain of determinism in natural pheonomena is limited (as Einstein may like to disbelieve), so that the mathematical description of random behavior must play a large role in describing the world (not limited to the quantum world:-). From the point of view of deductive logic that has shaped so much of traditional statistical education, probability is more basic than statistics --- it provides the stochastic models that describe the variability in observed data. However, from the point of view of a student, I believe that statistics is more basic than probability. That's why I advocate the focus on EDA rather than classical parametric statistics in the beginning of your statistics study. The variability in multivariate data can be visualized by software for human eyes, our eyes are the most powerful weapon towards discovery. Data visualization seems to be more productive than number crunching.

In the ideal Platonic world of mathematics (or the Newtonian world of "natural philosophy"), we can start with a probabilistic chicken and use deductive logic to lay a statistical egg. But in the messier reality of empirical sciences (esp. chemistry and biology), we must start with the egg as observed data and construct a prior probabilistic chicken as an inference. That's why Bayesian statistics became more popular than classical statistics (in Fisher's spirit) in recent years. In a beginner's introductory statistics course, the chicken's only value is to explain where eggs come from. It seems a little unfair for a beginner to learn about egg-generators before they become familiar with eggs. From my experience, I can never imagine starting the study of chemistry with quantum mechanics!
So I believe in jumping into the water of statistics to learn to swim through huge amount of data, use the philosophy of 边走边打 or 缺啥补啥 to go back covering your holes in probability theory.

Conceptually, probability might be THE hardest subject in elementary mathematics. The history of probabilistic ideas is fascinating but a little frightening. Human brain's intuition often fails on many tricky conditional probability problems. Many educators believe that "teaching a conceptual grasp of probability stills appears to be a very difficult task, fraught with ambiguity and illusion".

So the take-home message is --- don't waste time in frustrations with the subtle "chicken" of probability theory, go for the "egg" of statistics first when you tackle the vast field of machine learning. Good luck!


海天一片,对景愁怀倦。心似木船独飘零,惆怅远景难见。
命里沉浮谁主,流年似水空度。浩翰烟波如故,当时容颜何处。


发表时间:2006-01-06, 22:24:50  作者资料

快刀浪子

发表文章数: 1200
武功等级: 天山折梅手
     (第五重)
内力值: 546/546

Re: 如何系统地学好Machine Learning



omni兄的知识很全面。厉害!


冷风如刀,以大地为砧板,视众生为鱼肉。
万里飞雪,将苍穹作洪炉,熔万物为白银。


发表时间:2006-01-07, 10:01:02  作者资料

Omni

发表文章数: 305
武功等级: 太极剑法
     (第五重)
内力值: 374/374

More suggestions from a UPenn student



[Note]: One of my friends is a current Ph.D. student at UPenn Bioinformatics Program, he has a lot of background in machine learning. I forwarded your question to him and here are some of his comments with my minor editing to make the style tidy. I think he's a better person to help you given his solid CS background and hands-on experience in ML.

====================================================
Based on my learning experience, as lowiser said, to learn machine learning (ML) well does require a lot of prerequisites, e.g. Statistics, Probability, Calculus, Graph Theory, linear algebra, numberical optimization and etc. I am not surprised he felt frustrated to learn ML without these bases. After all he spent just half an year; ML is an advance course
in computer science; he needs to spend more time, particularly if he has few aforementioned bases.

Here are my suggestions for him:

Step 1) Learn the prerequisites, particularly Statistics, Probability, Calculus, linear algebra and numberical optimization;

Step 2) Learn some classic ML methods, including SVM, aritificial neural
network, HMM, Decision tree, Naive Bayesian and etc. recommend the software weka (http://www.cs.waikato.ac.nz/ml/weka/).

Step 3) Only after 2), he can learn to view and think about different models in a higher level; generalization: many of them might actually fall into the same category, based on the same principle; the math bases
behind the principles.

I recommend the book, as Omni suggested, The Elements of Statistical Learning, which is really good. I also recommend the book by Warren "Statistical Methods in Bioinformatics: An Introduction (Statistics for Biology and Health)", if he is also interested in bioinformatics.

ML arises from CS. From the perspective of applied mathematics and
statistics, it is actually functional estimation and approximation. If there are any similarities and differences between ML and mathematics, I would say that ML is also a kind of applied math, as what Omni said, math with a
certain context.

Lastly, math is much more objective than ML. Since ML is a Learning,
it is subjective because different people may have different
interpretation (criteria) of anything. In ML you have to make a lot of
subjective assumptions / criteria to model a problem. This is different
from axioms, the assumptions in math.


海天一片,对景愁怀倦。心似木船独飘零,惆怅远景难见。
命里沉浮谁主,流年似水空度。浩翰烟波如故,当时容颜何处。


发表时间:2006-01-07, 12:43:04  作者资料

西门吹牛

发表文章数: 469
武功等级: 空明拳
     (第三重)
内力值: 416/416

Re: 如何系统地学好Machine Learning



感觉Omni 兄身份很神秘


一舞剑气动四方,天下英雄莫能挡
形踪飘忽疑无影,冷面郎君傲雪霜


发表时间:2006-01-08, 09:22:26  作者资料

like

发表文章数: 526
武功等级: 空明拳
     (第九重)
内力值: 429/429

Re: 如何系统地学好Machine Learning



看星空和西门都好奇,我来揭这个秘:Omni一来就自报是站长中学和大学的双重校友,而且看来现在是在默克制药.


发表时间:2006-01-09, 03:33:27  作者资料

lowiser

发表文章数: 11
武功等级: 野球拳
     (第二重)
内力值: 83/83

Re: 如何系统地学好Machine Learning



Thank you guys so much. They are really helpful..


I am a Computer Science Postgraduate. From National University of Singapore. Singapore-MIT Alliance. http://wing.comp.nus.edu.sg/~luwei/


发表时间:2006-02-02, 03:50:37  作者资料