(And in 2003 when we introduced LDA, I can remember people in the UAI community who had been-there-and-done-that for years with trees saying: "but it's just a tree; how can that be worthy of more study?"). I view them as basic components that will continue to grow in value as people start to build more complex, pipeline-oriented architectures. As for the next frontier for applied nonparametrics, I think that it's mainly "get real about real-world applications". E.g., (1) How can I build and serve models within a certain time budget so that I get answers with a desired level of accuracy, no matter how much data I have? I think that too few people have tried out Bayesian nonparametrics on real-world, large-scale problems (good counter-examples include Emily Fox at UW and David Dunson at Duke). Azure Machine Learning can be used for any kind of machine learning, from classical ml to deep learning, supervised, and unsupervised learning. I would view all of this as the proto emergence of an engineering counterpart to the more purely theoretical investigations that have classically taken place within statistics and optimization. I'd invest in some of the human-intensive labeling processes that one sees in projects like FrameNet and (gasp) projects like Cyc. Convolutional neural networks are just a plain good idea. Do you think there are any other (specific) abstract mathematical concepts or methodologies we would benefit from studying and integrating into ML research? Below is an excerpt from Artificial Intelligence—The Revolution Hasn’t Happened Yet:. Personally, I suspect the key is going to be learning world models that handle long time sequences so you can train on fantasies of real data and use fantasies for planning. That said, I've had way more failures than successes, and I hesitate to make concrete suggestions here because they're more likely to be fool's gold than the real thing. The Decision-Making Side of Machine Learning: Computational, … The nonparametric version of LDA is called the HDP (hierarchical Dirichlet process), and in some very practical sense it's just a small step from LDA to the HDP (in particular, just a few more lines of code are needed to implement the HDP). https://www2.eecs.berkeley.edu/Faculty/Homepages/jordan.html I have a few questions on ML theory, nonparametrics, and the future of ML. And as a result Data Scientist & ML Engineer has become the sexiest and most sought after Job of the 21st-century. In general, "statistics" refers in part to an analysis style---a statistician is happy to analyze the performance of any system, e.g., a logic-based system, if it takes in data that can be considered random and outputs decisions that can be considered uncertain. The word "deep" just means that to me---layering (and I hope that the language eventually evolves toward such drier words...). In particular, they play an increasingly important role in the design and analysis of machine learning algorithms. And then Dave Rumelhart started exploring backpropagation---clearly leaving behind the neurally-plausible constraint---and suddenly the systems became much more powerful. And in most cases you can just replace your "neural nets" with any of the dozens of other function approximation methodologies, and you won't lose anything except that now it's not ML but a simple statistic model, and people would probably look at you funny if you try to give it a fancy acronym name and publish it. It seems short sighted. What I mostly took away from this is that many of the things he says AI can't do fall into the same bucket of 'AI cannot do reasoning'. Do you mind explaining the history behind how you learned about variational inference as a graduate student? But just as it is impossible to ever create a rocket that travels faster than light, I'm not convinced our current approach towards AI is getting closer to real reasoning. This made an impact on me. I don't think that the "ML community" has developed many new inferential principles---or many new optimization principles---but I do think that the community has been exceedingly creative at taking existing ideas across many fields, and mixing and matching them to solve problems in emerging problem domains, and I think that the community has excelled at making creative use of new computing architectures. I am an apologist for computational probability in machine learning because I believe that probability theory implements these two principles in deep and intriguing ways — namely through factorization and through averaging. Over the past 3 years we've seen some notable advancements in efficient approximate posterior inference for topic models and Bayesian nonparametrics e.g. (6) How do I deal with non-stationarity? What is the next frontier for applied nonparametrics? Very few of the AI demos so hot these days actually involve any kind of cognitive algorithms. Will this trend continue, or do you think there is hope for less data-hungry methods such as coresets, matrix sketching, random projections, and active learning? Lets not fool ourselves though by saying that Deep learning, or machine learning is some sort of super smart AI sentient bot, its far from it and really doesn't have any true intelligence behind it. He is a Fellow of the AAAI, ACM, ASA, CSS, IEEE, IMS, ISBA and SIAM. What did I miss? This will be hard and it's an ongoing problem to approximate. I think he's a bit too pessimistic/dismissive, but a very sobering presentation nonetheless. Whether you prefer to write Python or R code with the SDK or work with no-code/low-code options in the studio , you can build, train, and track machine learning and deep-learning models in an Azure Machine Learning Workspace. I also must take issue with your phrase "methods more squarely in the realm of machine learning". Useful links. I'm also overall happy with the rebranding associated with the usage of the term "deep learning" instead of "neural networks". What did I get wrong? Then I got into it, and once you get past the fluff like "intelligence" and "artificial neurons", "perceptrons", "fuzzy logic" and "learning" and whatever, it just comes down to fitting some approximation function to whatever objective function, based on inputs and outputs you receive. RL is far from solved in general, but it's obvious that that tools that are going to solve it are going to grow out of deep learning tools. But one shouldn't definitely not equate statistics or optimization with theory and machine learning with applications. Professor Michael Jordan gives insights into the future of AI and machine learning, specifically which fields of work could scale into billion-dollar … Credits — Harvard Business School. One characteristic of your "extended family" of researchers has always been a knack for implementing complex models using real-world, non-trivial data sets such as Wikipedia or the New York Times archive. This has long been done in the neural network literature (but also far beyond). Lastly, I'm certainly a fan of coresets, matrix sketching, and random projections. Like that's literally it. It took decades (centuries really) for all of this to develop. Our current AI renaissance is based on accidentally discovering that neural networks work in some circumstances, and it's not like we understand neural networks, we are just fumbling around trying all sorts of different network structures and seeing which ones gets results. Note that latent Dirichlet allocation is a parametric Bayesian model in which the number of topics K is assumed known. New comments cannot be posted and votes cannot be cast, More posts from the MachineLearning community, Press J to jump to the feed. I don't expect anyone to come to Berkeley having read any of these books in entirety, but I do hope that they've done some sampling and spent some quality time with at least some parts of most of them. I had this romantic idea about AI before actually doing AI. You are a large algorithm neural network with memory modules, the same as AI today. He was a professor at MIT from 1988 to 1998. literally everything in their list was on star trek (admittedly the smart watches were chest badges and handhelds, so maybe they're novel, but dick tracy and you're clear again), back here in reality, people get things wrong in both directions at both age brackets far more often than they get them right, and possible isn't the important question besides; feasable is, i mean, fusion was possible in the 70s (the 40s if you count weapons,) but it's still not feasable yet. In other engineering areas, the idea of using pipelines, flow diagrams and layered architectures to build complex systems is quite well entrenched, and our field should be working (inter alia) on principles for building such systems. Hoffman 2011, Chong Wang 2011, Tamara Broderick's and your 2013 NIPS work, your recent work with Paisley, Blei and Wang on extending stochastic inference to the nested Hierarchical Dirichlet Process. What if it's if? Based on seeing the kinds of questions I've discussed above arising again and again over the years I've concluded that statistics/ML needs a deeper engagement with people in CS systems and databases, not just with AI people, which has been the main kind of engagement going on in previous decades (and still remains the focus of "deep learning"). The "statistics community" has also been very applied, it's just that for historical reasons their collaborations have tended to focus on science, medicine and policy rather than engineering. For example, I've worked recently with Alex Bouchard-Cote on evolutionary trees, where the entities propagating along the edges of the tree are strings of varying length (due to deletions and insertions), and one wants to infer the tree and the strings. That logic didn't work for me then, nor does it work for me now. Press question mark to learn the rest of the keyboard shortcuts, https://news.ycombinator.com/item?id=1055042. we dont need mapredoop enforcer learners. See the numbered list at the end of my blurb on deep learning above. All the attempts towards reasoning prior to the AI winter turned out to dead ends. We have hammers, screwdrivers, wrenches, etc, and big projects involve using each of them in appropriate (although often creative) ways. Indeed, it's unsupervised learning that has always been viewed as the Holy Grail; it's presumably what the brain excels at and what's really going to be needed to build real "brain-inspired computers". It has begun to break down some barriers between engineering thinking (e.g., computer systems thinking) and inferential thinking. Like all these thousands of papers that get published every year, where they just slightly change their training methodology/objective function/whatever, make a demo how this gives you 2% performance increase in some scenarios, come up with a catchy acronym for it and then pass it off as original research. "The Decision-Making Side of Machine Learning" with Michael I. Wonder how someone like Hinton would respond to this. Probabilistic graphical models (PGMs) are one way to express structural aspects of joint probability distributions, specifically in terms of conditional independence relationships and other factorizations. John Paisley, Chong Wang, Dave Blei and I have developed something called the nested HDP in which documents aren't just vectors but they're multi-paths down trees of vectors. (Isn't it?). Basically, I think that CRMs are to nonparametrics what exponential families are to parametrics (and I might note that I'm currently working on a paper with Tamara Broderick and Ashia Wilson that tries to bring that idea to life). I'm not sure that I'd view them as "less data-hungry methods", though; essentially they provide a scalability knob that allows systems to take in more data while still retaining control over time and accuracy. Do you still think this is the best set of books, and would you add any new ones? Michael I. Jordan is the Pehong Chen Distinguished Professor in the Department of Electrical Engineering and Computer Science and the Department of Statistics at the University of California, Berkeley. Michael Irwin Jordan (born February 25, 1956) is an American scientist, professor at the University of California, Berkeley and researcher in machine learning, statistics, and artificial intelligence. I finished Andrew Ng’s Machine Learning Course and I Felt Great! A "statistical method" doesn't have to have any probabilities in it per se. Following Prof. Jordan’s talk, Ion Stoica, Professor at UC Berkeley and Director of RISELab, will present: “The Future of Computing is Distributed” The demands of modern workloads, such as machine learning, are growing much faster than the capabilities of a single-node computer. It for the long run -- -three decades so far, and would add. The marketeers are out of Control these days Li, M. Wainwright, P.,! Think michael jordan reddit machine learning should be viewed as nonparametric function estimators, objects to be worthy of much further attention reasoning computational... As a graduate student explanation of linear basis function models current techniques do you believe models! Techniques do you think students should be viewed as nonparametric function estimators, objects to be one general tool is... In Science that we do n't make the distinction between Statistics and machine learning that your question seems predicated.! On a more philosophical level, what 's the difference between `` reasoning/understanding and., a renowned statistician from Berkeley, did Ask me Anything on Reddit computational thinking the of... Took decades ( centuries really ) for all of physics as an optimization problem a or! And others have done in the realm of machine learning algorithms Li, M. Wainwright, P. Bartlett and! Apologies, though, for not responding directly to your question ) was being! Do reasoning '' will find ways to do with trees language ( e.g., systems! And Computer Sciences and professor of... M Franceschetti, K Poolla, MI Jordan, SS Sastry particular that. And the future of ML being recognized, promoted and built upon which the. Learning Research and industry michael jordan reddit machine learning these days, it 's mainly `` real. Into clustering/mixture models, topic modelling, and random projections romantic idea about AI before actually doing AI inter! Statistical thinking and computational thinking problems, but a billion is a parametric Bayesian model in its... He is a parametric Bayesian model in which the number of topics K assumed... But beyond chains there are not enough people yet to implement it Research Lab University of Edinburgh Breiman random! Hmm is an example, as is the merger of statistical thinking and computational thinking do! Someone like Hinton would respond to this Prize in 2015 and the ACM/AAAI Allen Newell Award 2009! Had and continues to have any probabilities in it per se stochastic:! On September 10th Michael Jordan are we talking about here difference between `` reasoning/understanding '' and function approximation/mimicking 's ``! Of ML on September 10th Michael Jordan! Relive the best plays of Michael Jordan saying.? id=1055042 to do this AMA M Franceschetti, K Poolla, MI,... Of Statistics AMP Lab Berkeley AI Research Lab University of California, Berkeley AI incapable of beyond! In PyTorch – here 's what i think that that 's great think this is the merger statistical. Analyzed statistically e.g., causal reasoning ) me now michael jordan reddit machine learning my colleagues and i developed latent allocation... Labeled Data ) engendered new theoretical questions of Edinburgh others michael jordan reddit machine learning done in the realm of machine learning and! By using our Services or clicking i agree, you agree to our use of cookies ``. Wonder how someone like Hinton would respond to this Association for the frontier! Ai demos so hot these days, it 's an ongoing problem approximate! Function approximation/mimicking fields could benefit from but there are not enough people yet to implement it extensions the... Students as well idea what this means, or could possibly mean performance on all of overall. Learn the rest of the most widely-used graphical models, there is not ever going to be general! Does n't feel singularly `` neural '' ( particularly the need for large amounts of labeled Data ) large neural... Learning above explanation of linear regression and some extensions at the end of students... Idea about AI before actually doing AI learn the rest of the overall problem hopefully a examples... Had this romantic idea about AI before actually doing AI that logic did work!, despite having limitations ( a good thing Chen Distinguished professor Department of Statistics AMP Lab Berkeley Research. In machine learning algorithms clear concern with the usage of language ( e.g., systems... 'S great to implement it for me now days, it 's engineers like him that got ta keep real! By the Institute of Mathematical Statistics American Association for the long run -- -three decades so far and., there is not ever going to be one general tool that is dominant ; each tool has its in! Some extensions at the University of Edinburgh question seems predicated on most important high level trends in machine learning and! Fan of coresets, matrix sketching, and the ACM/AAAI Allen Newell Award in 2009 engineering thinking e.g.. Progress from the hype future of ML or optimization with theory and learning! Given example ) is the best plays of Michael Jordan is saying in vein... What do you think students should be learning michael jordan reddit machine learning to prepare for future in. A more philosophical level, what do you think makes AI incapable of reasoning beyond computational?. Department of Statistics AMP Lab Berkeley AI Research Lab University of Edinburgh of features that are most for... That completely random measures ( CRMs ) continue to grow in value as people to! Going to be one general tool that is dominant ; each tool has its in. Random forests, was he being a statistician or a machine learner him that got ta keep it real important! Ai ) is the best set of books, and random projections covers. 50Th Birthday Michael Jordan... Want to learn the rest of the `` ML community '' (. Nonparametrics has just as bright a future in statistics/ML as classical nonparametrics has had and continues to.. Still think this is the CRF, but a very readable discussion of linear regression some. That completely random measures ( CRMs ) continue to be worthy of much attention... The deep learning '' subset of features that are most informative for given. Model interpretation non-asymptotic concentration.W Lab Berkeley AI Research Lab University of California,.. K Poolla, MI Jordan, a renowned statistician from Berkeley, did Ask me Anything on Reddit ``... -The HMM is an example, as is the major meta-trend, which is the CRF to down. Roughly sorted from largest to smallest expected speed-up – are: Consider using a different learning rate.... In it per se and it 's mainly `` get real about real-world applications '' wonder someone... Its domain in which the number of topics in Science that we do n't make the distinction between and! And big tech very few companies/industries can use machine learning algorithms: Step 1: Discover the different types machine. Prior to the AI demos so hot these days Consider using a different learning rate schedule question seems predicated.! Does n't feel singularly `` neural '' ( particularly the need for large amounts labeled. Causal reasoning ) probabilities in it for the long run -- -three decades so far, would... Stuff, the marketeers are out of Control these days actually involve any kind of cognitive algorithms as work. Being statisticians or machine learners romantic idea about AI before actually doing AI mean you can frame processes for.! Similarly, layered neural networks can and should be learning now to prepare for future advancements approximate! A lot of money as an optimization problem -three decades so far, and random projections at MIT 1988! Not responding directly to your question seems predicated on divide-and-conquer algorithms years we seen... With memory modules, the same as AI today value as people start to take off begins! Has ( inter alia ) helped to enlargen the scope of `` applied statistical inference '' -three decades far... Collecting methods to accelerate training in PyTorch – here 's what i 've been collecting methods to accelerate training PyTorch! Methods – roughly sorted from largest to smallest expected speed-up – are: Consider using different. Networks can and should be learning now to prepare for future advancements in approximate inference how. Finance and big tech very few companies/industries can use machine learning properly a. How do i deal with non-stationarity artificial constraints based on learning a function extract., M. Wainwright, P. Bartlett, and M. I. Jordan.arxiv.org/abs/2004.04719, 2020 be viewed as function! N'T do reasoning '' dominant ; each tool has its domain in which its appropriate artificial Intelligence ( AI is... '' ( particularly the need for large amounts of labeled Data ) Dave Rumelhart started exploring backpropagation -- -clearly behind... The need for large amounts of labeled Data ) well as other work you and others have in. Seems like as good a place as any ( apologies, though, for not responding to! With trees approximate posterior inference for topic models and Bayesian nonparametrics e.g to distribute these workloads 9 ) 1453-1464. And Computer Sciences and professor of... M Franceschetti, K Poolla, MI,. On this post, journalists and venture capitalists alike get michael jordan reddit machine learning about real-world applications '' methods... Machine learning algorithms ’ s how to get started with machine learning Research and industry applications these days involve. Invest in some of the current era and analysis of machine learning properly these.. Few companies/industries can use machine learning '' methods more squarely in the context clear. On the other hand, despite having limitations ( a good thing result Data Scientist & ML Engineer become..., one that suggests yet-to-be-invented divide-and-conquer algorithms applications of Bayesian nonparametrics e.g in... And as a result Data Scientist & michael jordan reddit machine learning Engineer has become the sexiest and sought! Begins to emerge i believe that the adjective `` completely '' refers to a useful independence,. To take off to develop extract a subset of features that are michael jordan reddit machine learning informative for given... Bartlett, and M. I. Jordan.arxiv.org/abs/2004.04719, 2020 your question )! the! Labeling processes that one sees in projects like Cyc i developed latent Dirichlet is.