<?xml version="1.0" encoding="UTF-8"?>

<rdf:RDF
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
   xmlns="http://purl.org/rss/1.0/"
   xmlns:dc="http://purl.org/dc/elements/1.1/"
   xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/"
   xmlns:dcterms="http://purl.org/dc/terms/"

>
<channel rdf:about="http://www.citeulike.org/about">
<pubDate>Thu, 21 Aug 2008 17:25:51 BST</pubDate>


	<title>CiteULike: sdvillal ml-foundations</title>
	<description>CiteULike: sdvillal ml-foundations</description>


	<link>http://www.citeulike.org/user/sdvillal/tag/ml-foundations</link>
	<dc:publisher>CiteULike.org</dc:publisher>
	<dc:language>en-gb</dc:language>
	<dc:rights>Copyright &#169; 2004-2008 citeulike.org</dc:rights>
	<items>
    <rdf:Seq>
        <rdf:li rdf:resource="http://www.citeulike.org/user/sdvillal/article/3112622"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/sdvillal/article/2737704"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/sdvillal/article/2737626"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/sdvillal/article/2712765"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/sdvillal/article/2693500"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/sdvillal/article/2693473"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/sdvillal/article/2693466"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/sdvillal/article/510440"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/sdvillal/article/2205725"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/sdvillal/article/2406064"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/sdvillal/article/1397642"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/sdvillal/article/1395265"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/sdvillal/article/1395264"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/sdvillal/article/100137"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/sdvillal/article/1201101"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/sdvillal/article/1155419"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/sdvillal/article/1147892"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/sdvillal/article/1147886"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/sdvillal/article/106699"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/sdvillal/article/115106"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/sdvillal/article/353426"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/sdvillal/article/1126863"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/sdvillal/article/221347"/>

	</rdf:Seq>
	</items>
	</channel>


<item rdf:about="http://www.citeulike.org/user/sdvillal/article/3112622">
    <title>Putting Things in Order: On the Fundamental Role of Ranking in Classification and Probability Estimation</title>
    <link>http://www.citeulike.org/user/sdvillal/article/3112622</link>
    <description>&lt;i&gt;Knowledge Discovery in Databases: PKDD 2007 (2007), pp. 2-3.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;While a binary classifier aims to distinguish positives from negatives, a ranker orders instances from high to low expectation that the instance is positive. Most classification models in machine learning output some score of ‘positiveness’, and hence can be used as rankers. Conversely, any ranker can be turned into a classifier if we have some instance-independent means of splitting the ranking into positive and negative segments. This could be a fixed score threshold; a point obtained from fixing the slope on the ROC curve; the break-even point between true positive and true negative rates; to mention just a few possibilities. These connections between ranking and classification notwithstanding, there are considerable differences as well. Classification performance on n examples is measured by accuracy, an O(n) operation; ranking performance, on the other hand, is measured by the area under the ROC curve (AUC), an O(n logn) operation. The model with the highest AUC does not necessarily dominate all other models, and thus it is possible that another model would achieve a higher accuracy for certain operating conditions, even if its AUC is lower. However, within certain model classes good ranking performance and good classification performance are more closely related than suggested by the previous remarks. For instance, there is evidence that certain classification models, while designed to optimise accuracy, in effect optimise an AUC-based loss function [1]. It has also been known for some time that decision tree yield convex training set ROC curves by construction [2], and thus optimising training set accuracy is likely to lead to good training set AUC. In this talk I will investigate the relation between ranking and classification more closely. I will also consider the connection between ranking and probability estimation. The quality of probability estimates can be measured by, e.g., mean squared error in the probability estimates (the Brier score). However, like accuracy, this is an O(n) operation that doesn’t fully take ranking performance into account. I will show how a novel decomposition of the Brier score into calibration loss and refinement loss [3] sheds light on both ranking and probability estimation performance. While previous decompositions are approximate [4], our decomposition is an exact one based on the ROC convex hull. (The connection between the ROC convex hull and calibration was independently noted by [5]). In the case of decision trees, the analysis explains the empirical evidence that probability estimation trees produce well-calibrated probabilities [6].</description>
    <dc:title>Putting Things in Order: On the Fundamental Role of Ranking in Classification and Probability Estimation</dc:title>

    <dc:creator>Peter Flach</dc:creator>
    <dc:identifier>doi:10.1007/978-3-540-74976-9_2</dc:identifier>
    <dc:source>Knowledge Discovery in Databases: PKDD 2007 (2007), pp. 2-3.</dc:source>
    <dc:date>2008-08-12T18:59:31-00:00</dc:date>
    <prism:publicationYear>2007</prism:publicationYear>
    <prism:publicationName>Knowledge Discovery in Databases: PKDD 2007</prism:publicationName>
    <prism:startingPage>2</prism:startingPage>
    <prism:endingPage>3</prism:endingPage>
    <prism:category>calibration</prism:category>
    <prism:category>ml-foundations</prism:category>
    <prism:category>ranking</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/sdvillal/article/2737704">
    <title>Scale-sensitive dimensions, uniform convergence, and learnability</title>
    <link>http://www.citeulike.org/user/sdvillal/article/2737704</link>
    <description>&lt;i&gt;J. ACM, Vol. 44, No. 4. (July 1997), pp. 615-631.&lt;/i&gt;</description>
    <dc:title>Scale-sensitive dimensions, uniform convergence, and learnability</dc:title>

    <dc:creator>Noga Alon</dc:creator>
    <dc:creator>Shai Ben-David</dc:creator>
    <dc:creator>Nicol&#242; Cesa-Bianchi</dc:creator>
    <dc:creator>David Haussler</dc:creator>
    <dc:identifier>doi:10.1145/263867.263927</dc:identifier>
    <dc:source>J. ACM, Vol. 44, No. 4. (July 1997), pp. 615-631.</dc:source>
    <dc:date>2008-04-30T12:29:38-00:00</dc:date>
    <prism:publicationYear>1997</prism:publicationYear>
    <prism:publicationName>J. ACM</prism:publicationName>
    <prism:issn>0004-5411</prism:issn>
    <prism:volume>44</prism:volume>
    <prism:number>4</prism:number>
    <prism:startingPage>615</prism:startingPage>
    <prism:endingPage>631</prism:endingPage>
    <prism:publisher>ACM</prism:publisher>
    <prism:category>generalization</prism:category>
    <prism:category>learnability</prism:category>
    <prism:category>ml-foundations</prism:category>
    <prism:category>scale-sensitive</prism:category>
    <prism:category>uniform-convergence</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/sdvillal/article/2737626">
    <title>Reliable Reasoning: Induction and Statistical Learning Theory (Jean Nicod Lectures)</title>
    <link>http://www.citeulike.org/user/sdvillal/article/2737626</link>
    <description>&lt;i&gt;(01 May 2007)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;In &#60;i&#62;Reliable Reasoning,&#60;/i&#62; Gilbert Harman and Sanjeev Kulkarni--a philosopher and an engineer--argue that philosophy and cognitive science can benefit from statistical learning theory (SLT), the theory that lies behind recent advances in machine learning. The philosophical problem of induction, for example, is in part about the reliability of inductive reasoning, where the reliability of a method is measured by its statistically expected percentage of errors--a central topic in SLT.&#60;br /&#62; &#60;br /&#62; After discussing philosophical attempts to evade the problem of induction, Harman and Kulkarni provide an admirably clear account of the basic framework of SLT and its implications for inductive reasoning. They explain the Vapnik-Chervonenkis (VC) dimension of a set of hypotheses and distinguish two kinds of inductive reasoning, describing fundamental results about the power and limits of those methods in terms of the VC-dimension of the hypotheses being considered. The VC-dimension is found to be superior to a related measure proposed by Karl Popper, and shown not to correspond exactly to ordinary notions of simplicity. The authors discuss various topics in machine learning, including nearest-neighbor methods, neural networks, and support vector machines. Finally, they describe transductive reasoning and suggest possible new models of human reasoning suggested by developments in SLT.</description>
    <dc:title>Reliable Reasoning: Induction and Statistical Learning Theory (Jean Nicod Lectures)</dc:title>

    <dc:creator>Gilbert Harman</dc:creator>
    <dc:creator>Sanjeev Kulkarni</dc:creator>
    <dc:source>(01 May 2007)</dc:source>
    <dc:date>2008-04-30T11:55:19-00:00</dc:date>
    <prism:publicationYear>2007</prism:publicationYear>
    <prism:publisher>The MIT Press</prism:publisher>
    <prism:category>ml-foundations</prism:category>
    <prism:category>ml-philosophy</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/sdvillal/article/2712765">
    <title>Tutorial on Practical Prediction Theory for Classification</title>
    <link>http://www.citeulike.org/user/sdvillal/article/2712765</link>
    <description>&lt;i&gt;Journal of Machine Learning Research, Vol. 6 (March 2005), pp. 273-306.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;We discuss basic prediction theory and its impact on classification success evaluation, implications for learning algorithm design, and uses in learning algorithm execution. This tutorial is meant to be a comprehensive compilation of results which are both theoretically rigorous and quantitatively useful.</description>
    <dc:title>Tutorial on Practical Prediction Theory for Classification</dc:title>

    <dc:creator>John Langford</dc:creator>
    <dc:source>Journal of Machine Learning Research, Vol. 6 (March 2005), pp. 273-306.</dc:source>
    <dc:date>2008-04-24T11:35:10-00:00</dc:date>
    <prism:publicationYear>2005</prism:publicationYear>
    <prism:publicationName>Journal of Machine Learning Research</prism:publicationName>
    <prism:volume>6</prism:volume>
    <prism:startingPage>273</prism:startingPage>
    <prism:endingPage>306</prism:endingPage>
    <prism:category>error-estimation</prism:category>
    <prism:category>learning-bounds</prism:category>
    <prism:category>ml-foundations</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/sdvillal/article/2693500">
    <title>Abduction and Induction: Essays on their Relation and Integration (Applied Logic Series)</title>
    <link>http://www.citeulike.org/user/sdvillal/article/2693500</link>
    <description>&lt;i&gt;(30 April 2000)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;From the very beginning of their investigation of human reasoning, philosophers have identified two other forms of reasoning, besides deduction, which we now call abduction and induction. Deduction is now fairly well understood, but abduction and induction have eluded a similar level of understanding. The papers collected here address the relationship between abduction and induction and their possible integration. The approach is sometimes philosophical, sometimes that of pure logic, and some papers adopt the more task-oriented approach of AI. &#60;br/&#62; The book will command the attention of philosophers, logicians, AI researchers and computer scientists in general.</description>
    <dc:title>Abduction and Induction: Essays on their Relation and Integration (Applied Logic Series)</dc:title>

    <dc:source>(30 April 2000)</dc:source>
    <dc:date>2008-04-20T19:18:41-00:00</dc:date>
    <prism:publicationYear>2000</prism:publicationYear>
    <prism:publisher>Springer</prism:publisher>
    <prism:category>abduction</prism:category>
    <prism:category>ml-foundations</prism:category>
    <prism:category>ml-philosophy</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/sdvillal/article/2693473">
    <title>Smart Inductive Generalizations are Abductions</title>
    <link>http://www.citeulike.org/user/sdvillal/article/2693473</link>
    <description>&lt;i&gt;&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;This paper describes abduction as `inference to the best explanation' and argues that &#34;smart&#34; inductive generalizations are a special case of abductions. Along the way it argues that some good explanations are not proofs and some proofs are not explanations, concluding that explanations are not deductive proofs in any particularly interesting sense. An attractive alternative is that explanations are assignments of causal responsibility. Smart inductive generalizations can then be seen to be...</description>
    <dc:title>Smart Inductive Generalizations are Abductions</dc:title>

    <dc:creator>J Josephson</dc:creator>
    <dc:date>2008-04-20T19:05:03-00:00</dc:date>
    <prism:category>abduction</prism:category>
    <prism:category>ml-foundations</prism:category>
    <prism:category>ml-philosophy</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/sdvillal/article/2693466">
    <title>Integrating abduction and induction in machine learning</title>
    <link>http://www.citeulike.org/user/sdvillal/article/2693466</link>
    <description>&lt;i&gt;(1997)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;This paper discusses the integration of traditional abductive and inductive reasoning methods in the development of machine learning systems. In particular, the paper discusses our recent work in two areas: 1) The use of traditional abductive methods to propose revisions during theory refinement, where an existing knowledge base is modified to make it consistent with a set of empirical data; and 2) The use of inductive learning methods to automatically acquire from examples a diagnostic...</description>
    <dc:title>Integrating abduction and induction in machine learning</dc:title>

    <dc:creator>R Mooney</dc:creator>
    <dc:source>(1997)</dc:source>
    <dc:date>2008-04-20T19:02:30-00:00</dc:date>
    <prism:publicationYear>1997</prism:publicationYear>
    <prism:category>abduction</prism:category>
    <prism:category>ml-foundations</prism:category>
    <prism:category>ml-philosophy</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/sdvillal/article/510440">
    <title>Convexity, Classification, and Risk Bounds</title>
    <link>http://www.citeulike.org/user/sdvillal/article/510440</link>
    <description>&lt;i&gt;Journal of the American Statistical Association, Vol. 101, No. 473. (March 2006), pp. 138-156.&lt;/i&gt;</description>
    <dc:title>Convexity, Classification, and Risk Bounds</dc:title>

    <dc:creator>Peter Bartlett</dc:creator>
    <dc:creator>Michael Jordan</dc:creator>
    <dc:creator>Jon Mcauliffe</dc:creator>
    <dc:identifier>doi:10.1198/016214505000000907</dc:identifier>
    <dc:source>Journal of the American Statistical Association, Vol. 101, No. 473. (March 2006), pp. 138-156.</dc:source>
    <dc:date>2006-02-18T14:36:37-00:00</dc:date>
    <prism:publicationYear>2006</prism:publicationYear>
    <prism:publicationName>Journal of the American Statistical Association</prism:publicationName>
    <prism:issn>0162-1459</prism:issn>
    <prism:volume>101</prism:volume>
    <prism:number>473</prism:number>
    <prism:startingPage>138</prism:startingPage>
    <prism:endingPage>156</prism:endingPage>
    <prism:publisher>American Statistical Association</prism:publisher>
    <prism:category>error-estimation</prism:category>
    <prism:category>loss-functions</prism:category>
    <prism:category>ml-foundations</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/sdvillal/article/2205725">
    <title>On divergences, surrogate loss functions, and decentralized detection</title>
    <link>http://www.citeulike.org/user/sdvillal/article/2205725</link>
    <description>&lt;i&gt;(25 Oct 2005)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;We develop a general correspondence between a family of loss functions that act as surrogates to 0-1 loss, and the class of Ali-Silvey or $f$-divergence functionals. This correspondence provides the basis for choosing and evaluating various surrogate losses frequently used in statistical learning (e.g., hinge loss, exponential loss, logistic loss); conversely, it provides a decision-theoretic framework for the choice of divergences in signal processing and quantization theory. We exploit this correspondence to characterize the statistical behavior of a nonparametric decentralized hypothesis testing algorithms that operate by minimizing convex surrogate loss functions. In particular, we specify the family of loss functions that are equivalent to 0-1 loss in the sense of producing the same quantization rules and discriminant functions.</description>
    <dc:title>On divergences, surrogate loss functions, and decentralized detection</dc:title>

    <dc:creator>Xuanlong Nguyen</dc:creator>
    <dc:creator>Martin Wainwright</dc:creator>
    <dc:creator>Michael Jordan</dc:creator>
    <dc:source>(25 Oct 2005)</dc:source>
    <dc:date>2008-01-08T00:17:15-00:00</dc:date>
    <prism:publicationYear>2005</prism:publicationYear>
    <prism:category>error-estimation</prism:category>
    <prism:category>loss-functions</prism:category>
    <prism:category>ml-foundations</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/sdvillal/article/2406064">
    <title>Evidence Contrary to the Statistical View of Boosting</title>
    <link>http://www.citeulike.org/user/sdvillal/article/2406064</link>
    <description>&lt;i&gt;Journal of Machine Learning Research, Vol. 9 (February 2007), pp. 131-156.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;The statistical perspective on boosting algorithms focuses on optimization, drawing parallels with maximum likelihood estimation for logistic regression. In this paper we present empirical evidence that raises questions about this view. Although the statistical perspective provides a theoretical framework within which it is possible to derive theorems and create new algorithms in general contexts, we show that there remain many unanswered important questions. Furthermore, we provide examples that reveal crucial flaws in the many practical suggestions and new methods that are derived from the statistical view. We perform carefully designed experiments using simple simulation models to illustrate some of these flaws and their practical consequences.</description>
    <dc:title>Evidence Contrary to the Statistical View of Boosting</dc:title>

    <dc:creator>David Mease</dc:creator>
    <dc:creator>Abraham Wyner</dc:creator>
    <dc:source>Journal of Machine Learning Research, Vol. 9 (February 2007), pp. 131-156.</dc:source>
    <dc:date>2008-02-21T11:29:01-00:00</dc:date>
    <prism:publicationYear>2007</prism:publicationYear>
    <prism:publicationName>Journal of Machine Learning Research</prism:publicationName>
    <prism:volume>9</prism:volume>
    <prism:startingPage>131</prism:startingPage>
    <prism:endingPage>156</prism:endingPage>
    <prism:category>boosting</prism:category>
    <prism:category>ensemble-diversity</prism:category>
    <prism:category>ensembles</prism:category>
    <prism:category>ml-foundations</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/sdvillal/article/1397642">
    <title>Very Simple Classification Rules Perform Well on Most Commonly Used Datasets</title>
    <link>http://www.citeulike.org/user/sdvillal/article/1397642</link>
    <description>&lt;i&gt;Mach. Learn., Vol. 11, No. 1. (April 1993), pp. 63-90.&lt;/i&gt;</description>
    <dc:title>Very Simple Classification Rules Perform Well on Most Commonly Used Datasets</dc:title>

    <dc:creator>Robert Holte</dc:creator>
    <dc:identifier>doi:10.1023/A:1022631118932</dc:identifier>
    <dc:source>Mach. Learn., Vol. 11, No. 1. (April 1993), pp. 63-90.</dc:source>
    <dc:date>2007-06-18T22:20:00-00:00</dc:date>
    <prism:publicationYear>1993</prism:publicationYear>
    <prism:publicationName>Mach. Learn.</prism:publicationName>
    <prism:issn>0885-6125</prism:issn>
    <prism:volume>11</prism:volume>
    <prism:number>1</prism:number>
    <prism:startingPage>63</prism:startingPage>
    <prism:endingPage>90</prism:endingPage>
    <prism:publisher>Kluwer Academic Publishers</prism:publisher>
    <prism:category>ml-foundations</prism:category>
    <prism:category>ml-philosophy</prism:category>
    <prism:category>simplicity</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/sdvillal/article/1395265">
    <title>The existence of a priori distinctions between learning algorithms</title>
    <link>http://www.citeulike.org/user/sdvillal/article/1395265</link>
    <description>&lt;i&gt;Neural Comput., Vol. 8, No. 7. (October 1996), pp. 1391-1420.&lt;/i&gt;</description>
    <dc:title>The existence of a priori distinctions between learning algorithms</dc:title>

    <dc:creator>David Wolpert</dc:creator>
    <dc:source>Neural Comput., Vol. 8, No. 7. (October 1996), pp. 1391-1420.</dc:source>
    <dc:date>2007-06-17T16:34:46-00:00</dc:date>
    <prism:publicationYear>1996</prism:publicationYear>
    <prism:publicationName>Neural Comput.</prism:publicationName>
    <prism:issn>0899-7667</prism:issn>
    <prism:volume>8</prism:volume>
    <prism:number>7</prism:number>
    <prism:startingPage>1391</prism:startingPage>
    <prism:endingPage>1420</prism:endingPage>
    <prism:publisher>MIT Press</prism:publisher>
    <prism:category>error-estimation</prism:category>
    <prism:category>free-lunch</prism:category>
    <prism:category>ml-foundations</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/sdvillal/article/1395264">
    <title>The lack of a priori distinctions between learning algorithms</title>
    <link>http://www.citeulike.org/user/sdvillal/article/1395264</link>
    <description>&lt;i&gt;Neural Comput., Vol. 8, No. 7. (October 1996), pp. 1341-1390.&lt;/i&gt;</description>
    <dc:title>The lack of a priori distinctions between learning algorithms</dc:title>

    <dc:creator>David Wolpert</dc:creator>
    <dc:source>Neural Comput., Vol. 8, No. 7. (October 1996), pp. 1341-1390.</dc:source>
    <dc:date>2007-06-17T16:33:32-00:00</dc:date>
    <prism:publicationYear>1996</prism:publicationYear>
    <prism:publicationName>Neural Comput.</prism:publicationName>
    <prism:issn>0899-7667</prism:issn>
    <prism:volume>8</prism:volume>
    <prism:number>7</prism:number>
    <prism:startingPage>1341</prism:startingPage>
    <prism:endingPage>1390</prism:endingPage>
    <prism:publisher>MIT Press</prism:publisher>
    <prism:category>error-estimation</prism:category>
    <prism:category>free-lunch</prism:category>
    <prism:category>ml-foundations</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/sdvillal/article/100137">
    <title>General conditions for predictivity in learning theory</title>
    <link>http://www.citeulike.org/user/sdvillal/article/100137</link>
    <description>&lt;i&gt;Nature, Vol. 428, No. 6981. (25 March 2004), pp. 419-422.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Developing theoretical foundations for learning is a key step towards understanding intelligence. 'Learning from examples' is a paradigm in which systems (natural or artificial) learn a functional relationship from a training set of examples. Within this paradigm, a learning algorithm is a map from the space of training sets to the hypothesis space of possible functional solutions. A central question for the theory is to determine conditions under which a learning algorithm will generalize from its finite training set to novel examples. A milestone in learning theory was a characterization of conditions on the hypothesis space that ensure generalization for the natural class of empirical risk minimization (ERM) learning algorithms that are based on minimizing the error on the training set. Here we provide conditions for generalization in terms of a precise stability property of the learning process: when the training set is perturbed by deleting one example, the learned hypothesis does not change much. This stability property stipulates conditions on the learning map rather than on the hypothesis space, subsumes the classical theory for ERM algorithms, and is applicable to more general algorithms. The surprising connection between stability and predictivity has implications for the foundations of learning theory and for the design of novel algorithms, and provides insights into problems as diverse as language learning and inverse problems in physics and engineering.</description>
    <dc:title>General conditions for predictivity in learning theory</dc:title>

    <dc:creator>T Poggio</dc:creator>
    <dc:creator>R Rifkin</dc:creator>
    <dc:creator>S Mukherjee</dc:creator>
    <dc:creator>P Niyogi</dc:creator>
    <dc:identifier>doi:10.1038/nature02341</dc:identifier>
    <dc:source>Nature, Vol. 428, No. 6981. (25 March 2004), pp. 419-422.</dc:source>
    <dc:date>2005-02-21T17:28:44-00:00</dc:date>
    <prism:publicationYear>2004</prism:publicationYear>
    <prism:publicationName>Nature</prism:publicationName>
    <prism:issn>1476-4687</prism:issn>
    <prism:volume>428</prism:volume>
    <prism:number>6981</prism:number>
    <prism:startingPage>419</prism:startingPage>
    <prism:endingPage>422</prism:endingPage>
    <prism:category>ml-foundations</prism:category>
    <prism:category>ml-philosophy</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/sdvillal/article/1201101">
    <title>Discussion of the paper arcing classifiers by leo breiman</title>
    <link>http://www.citeulike.org/user/sdvillal/article/1201101</link>
    <description>&lt;i&gt;(1998)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;this paper, Breiman uses boosting-by-resampling, instead of boosting-by-reweighting and in this way combines the two methods.</description>
    <dc:title>Discussion of the paper arcing classifiers by leo breiman</dc:title>

    <dc:creator>Y Freund</dc:creator>
    <dc:creator>R Schapire</dc:creator>
    <dc:source>(1998)</dc:source>
    <dc:date>2007-03-31T21:22:36-00:00</dc:date>
    <prism:publicationYear>1998</prism:publicationYear>
    <prism:category>boosting</prism:category>
    <prism:category>ensembles</prism:category>
    <prism:category>error-estimation</prism:category>
    <prism:category>ml-foundations</prism:category>
    <prism:category>ml-philosophy</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/sdvillal/article/1155419">
    <title>Stacked Generalization</title>
    <link>http://www.citeulike.org/user/sdvillal/article/1155419</link>
    <description>&lt;i&gt;No. LA-UR-90-3460. (1990)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;: This paper introduces stacked generalization, a scheme for minimizing the generalization error rate of one or more generalizers. Stacked generalization works by deducing the biases of the generalizer(s) with respect to a provided learning set. This deduction proceeds by generalizing in a second space whose inputs are (for example) the guesses of the original generalizers when taught with part of the learning set and trying to guess the rest of it, and whose output is (for example) the correct ...</description>
    <dc:title>Stacked Generalization</dc:title>

    <dc:creator>DH Wolpert</dc:creator>
    <dc:source>No. LA-UR-90-3460. (1990)</dc:source>
    <dc:date>2007-03-12T13:49:47-00:00</dc:date>
    <prism:publicationYear>1990</prism:publicationYear>
    <prism:number>LA-UR-90-3460</prism:number>
    <prism:category>ensemble-diversity</prism:category>
    <prism:category>ensembles</prism:category>
    <prism:category>ml-foundations</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/sdvillal/article/1147892">
    <title>Simple classifiers</title>
    <link>http://www.citeulike.org/user/sdvillal/article/1147892</link>
    <description>&lt;i&gt;(2003)&lt;/i&gt;</description>
    <dc:title>Simple classifiers</dc:title>

    <dc:creator>A Cannon</dc:creator>
    <dc:creator>J Howse</dc:creator>
    <dc:creator>D Hush</dc:creator>
    <dc:creator>C Scovel</dc:creator>
    <dc:source>(2003)</dc:source>
    <dc:date>2007-03-08T19:45:17-00:00</dc:date>
    <prism:publicationYear>2003</prism:publicationYear>
    <prism:category>ml-foundations</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/sdvillal/article/1147886">
    <title>Multiple instance learning using simple classifiers</title>
    <link>http://www.citeulike.org/user/sdvillal/article/1147886</link>
    <description>&lt;i&gt;International Conference on Machine Learning and Applications, 2004. Proceedings. 2004 International Conference on (2004), pp. 123-128.&lt;/i&gt;</description>
    <dc:title>Multiple instance learning using simple classifiers</dc:title>

    <dc:creator>A Cannon</dc:creator>
    <dc:creator>D Hush</dc:creator>
    <dc:source>International Conference on Machine Learning and Applications, 2004. Proceedings. 2004 International Conference on (2004), pp. 123-128.</dc:source>
    <dc:date>2007-03-08T19:37:41-00:00</dc:date>
    <prism:publicationYear>2004</prism:publicationYear>
    <prism:publicationName>International Conference on Machine Learning and Applications, 2004. Proceedings. 2004 International Conference on</prism:publicationName>
    <prism:startingPage>123</prism:startingPage>
    <prism:endingPage>128</prism:endingPage>
    <prism:category>ml-foundations</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/sdvillal/article/106699">
    <title>Statistical Learning Theory</title>
    <link>http://www.citeulike.org/user/sdvillal/article/106699</link>
    <description>&lt;i&gt;(16 September 1998)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;A comprehensive look at learning and generalization theory. The statistical theory of learning and generalization concerns the problem of choosing desired functions on the basis of empirical data. Highly applicable to a variety of computer science and robotics fields, this book offers lucid coverage of the theory as a whole. Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.</description>
    <dc:title>Statistical Learning Theory</dc:title>

    <dc:creator>Vladimir Vapnik</dc:creator>
    <dc:source>(16 September 1998)</dc:source>
    <dc:date>2005-02-28T20:55:16-00:00</dc:date>
    <prism:publicationYear>1998</prism:publicationYear>
    <prism:publisher>Wiley-Interscience</prism:publisher>
    <prism:category>data-mining-books</prism:category>
    <prism:category>data-mining-general</prism:category>
    <prism:category>kernel-machines</prism:category>
    <prism:category>ml-foundations</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/sdvillal/article/115106">
    <title>The Nature of Statistical Learning Theory (Information Science and Statistics)</title>
    <link>http://www.citeulike.org/user/sdvillal/article/115106</link>
    <description>&lt;i&gt;(19 November 1999)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;The aim of this book is to discuss the fundamental ideas which lie behind the statistical theory of learning and generalization. It considers learning as a general problem of function estimation based on empirical data. Omitting proofs and technical details, the author concentrates on discussing the main results of learning theory and their connections to fundamental problems in statistics. These include: * the setting of learning problems based on the model of minimizing the risk functional from empirical data * a comprehensive analysis of the empirical risk minimization principle including necessary and sufficient conditions for its consistency * non-asymptotic bounds for the risk achieved using the empirical risk minimization principle * principles for controlling the generalization ability of learning machines using small sample sizes based on these bounds * the Support Vector methods that control the generalization ability when estimating function using small sample size. The second edition of the book contains three new chapters devoted to further development of the learning theory and SVM techniques. These include: * the theory of direct method of learning based on solving multidimensional integral equations for density, conditional probability, and conditional density estimation * a new inductive principle of learning. Written in a readable and concise style, the book is intended for statisticians, mathematicians, physicists, and computer scientists. Vladimir N. Vapnik is Technology Leader AT&#38;T Labs-Research and Professor of London University. He is one of the founders of statistical learning theory, and the author of seven books published in English, Russian, German, and Chinese.</description>
    <dc:title>The Nature of Statistical Learning Theory (Information Science and Statistics)</dc:title>

    <dc:creator>Vladimir Vapnik</dc:creator>
    <dc:source>(19 November 1999)</dc:source>
    <dc:date>2005-03-05T21:06:15-00:00</dc:date>
    <prism:publicationYear>1999</prism:publicationYear>
    <prism:publisher>Springer</prism:publisher>
    <prism:category>data-mining-books</prism:category>
    <prism:category>data-mining-general</prism:category>
    <prism:category>kernel-machines</prism:category>
    <prism:category>ml-foundations</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/sdvillal/article/353426">
    <title>An overview of statistical learning theory</title>
    <link>http://www.citeulike.org/user/sdvillal/article/353426</link>
    <description>&lt;i&gt;Neural Networks, IEEE Transactions on, Vol. 10, No. 5. (1999), pp. 988-999.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Statistical learning theory was introduced in the late 1960's. Until the 1990's it was a purely theoretical analysis of the problem of function estimation from a given collection of data. In the middle of the 1990's new types of learning algorithms (called support vector machines) based on the developed theory were proposed. This made statistical learning theory not only a tool for the theoretical analysis but also a tool for creating practical algorithms for estimating multidimensional functions. This article presents a very general overview of statistical learning theory including both theoretical and algorithmic aspects of the theory. The goal of this overview is to demonstrate how the abstract learning theory established conditions for generalization which are more general than those discussed in classical statistical paradigms and how the understanding of these conditions inspired new algorithmic approaches to function estimation problems</description>
    <dc:title>An overview of statistical learning theory</dc:title>

    <dc:creator>VN Vapnik</dc:creator>
    <dc:source>Neural Networks, IEEE Transactions on, Vol. 10, No. 5. (1999), pp. 988-999.</dc:source>
    <dc:date>2005-10-18T02:31:56-00:00</dc:date>
    <prism:publicationYear>1999</prism:publicationYear>
    <prism:publicationName>Neural Networks, IEEE Transactions on</prism:publicationName>
    <prism:volume>10</prism:volume>
    <prism:number>5</prism:number>
    <prism:startingPage>988</prism:startingPage>
    <prism:endingPage>999</prism:endingPage>
    <prism:category>ml-foundations</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/sdvillal/article/1126863">
    <title>E4 - Machine Learning</title>
    <link>http://www.citeulike.org/user/sdvillal/article/1126863</link>
    <description>&lt;i&gt;&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Machine learning's focus on ill-defined problems and highly flexible methods makes it ideally suited for KDD applications. Among the ideas machine learning contributes to KDD are the importance of empirical validation, the impossibility of learning without a priori assumptions, and the utility of limited-search or limited-representation methods. Machine learning provides methods for incorporating knowledge into the learning process, changing and combining representations, combatting the curse...</description>
    <dc:title>E4 - Machine Learning</dc:title>

    <dc:creator>Pedro Domingos</dc:creator>
    <dc:date>2007-02-27T10:55:05-00:00</dc:date>
    <prism:category>ml-foundations</prism:category>
    <prism:category>ml-philosophy</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/sdvillal/article/221347">
    <title>Towards parameter-free data mining</title>
    <link>http://www.citeulike.org/user/sdvillal/article/221347</link>
    <description>&lt;i&gt;(2004), pp. 206-215.&lt;/i&gt;</description>
    <dc:title>Towards parameter-free data mining</dc:title>

    <dc:creator>Eamonn Keogh</dc:creator>
    <dc:creator>Stefano Lonardi</dc:creator>
    <dc:creator>Chotirat Ratanamahatana</dc:creator>
    <dc:identifier>doi:10.1145/1014052.1014077</dc:identifier>
    <dc:source>(2004), pp. 206-215.</dc:source>
    <dc:date>2005-06-07T14:12:25-00:00</dc:date>
    <prism:publicationYear>2004</prism:publicationYear>
    <prism:startingPage>206</prism:startingPage>
    <prism:endingPage>215</prism:endingPage>
    <prism:publisher>ACM Press</prism:publisher>
    <prism:category>ml-foundations</prism:category>
</item>



</rdf:RDF>

