{"id":8257,"date":"2020-05-06T00:57:45","date_gmt":"2020-05-06T00:57:45","guid":{"rendered":"https:\/\/www.thecoderschool.com\/?p=8257"},"modified":"2022-10-13T20:49:56","modified_gmt":"2022-10-13T20:49:56","slug":"advanced-machine-learning-techniques-principal-component-analysis","status":"publish","type":"post","link":"https:\/\/www.thecoderschool.com\/blog\/advanced-machine-learning-techniques-principal-component-analysis\/","title":{"rendered":"Advanced Machine Learning Techniques: Principal Component Analysis"},"content":{"rendered":"<p><strong>By Camille D., Age 17<\/strong><\/p>\n<p><span style=\"font-weight: 400;\">This article will focus on a method data scientists and programmers use to make data easier to explore, visualize, and interpret data, called <\/span><b>principal component analysis (PCA)<\/b><span style=\"font-weight: 400;\">. The explanations in this article assume some background in linear algebra and statistics.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">PCA is based on <\/span><b>dimensionality reduction<\/b><span style=\"font-weight: 400;\">: \u201cthe process of reducing the number of random variables under consideration by obtaining a set of principal variables,\u201d in other words, transforming a large dataset into a smaller one without extracting too much key information. This process is considered expensive for machine learning algorithms; a little accuracy must be traded for simplicity. Minimizing this cost is part of the job for PCA. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">The first step of PCA is <\/span><b>standardization<\/b><span style=\"font-weight: 400;\">, the process that is the least mathematically involved. Standardization takes care of the variances within the initial variables, specifically with regards to their ranges. For example, the value of one variable may lie within the range of 0 to 10, and the value of another within the range of 0 to 1. The variable whose possible value lies between 0 and 10 will carry a greater weight over the second variable, leading to biased results. Mathematically, this can be addressed by subtracting the dataset\u2019s <\/span><a href=\"https:\/\/www.techopedia.com\/definition\/26136\/statistical-mean\"><span style=\"font-weight: 400;\">mean<\/span><\/a><span style=\"font-weight: 400;\"> from the value of the variable and dividing this result by the set\u2019s <\/span><a href=\"https:\/\/www.mathsisfun.com\/data\/standard-deviation.html\"><span style=\"font-weight: 400;\">standard deviation<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-8308 aligncenter\" src=\"https:\/\/www.thecoderschool.com\/wp-content\/uploads\/2019\/06\/image1-1-300x92.png\" alt=\"\" width=\"300\" height=\"92\" srcset=\"https:\/\/www.thecoderschool.com\/blog\/wp-content\/uploads\/2019\/06\/image1-1-300x92.png 300w, https:\/\/www.thecoderschool.com\/blog\/wp-content\/uploads\/2019\/06\/image1-1.png 428w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">After standardization is performed, the values of each variable will all be within the same range.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Note that standardization is different from <\/span><b>normalization<\/b><span style=\"font-weight: 400;\"> in descriptive statistics. Normalization rescales the values into a range from 0 to 1, while standardization rescales the dataset to have a mean of 0 and a standard deviation of 1.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In almost any case, of course, this will yield a value smaller than 1.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The second step, <\/span><b>covariance matrix computation<\/b><span style=\"font-weight: 400;\">, is where things unfortunately begin to get more complicated. We first must understand the definition of <\/span><b>covariance<\/b><span style=\"font-weight: 400;\">: \u201ca measure of how much two random variables vary together.\u201d<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-8310 aligncenter\" src=\"https:\/\/www.thecoderschool.com\/wp-content\/uploads\/2019\/06\/image3-1-300x136.png\" alt=\"\" width=\"300\" height=\"136\" srcset=\"https:\/\/www.thecoderschool.com\/blog\/wp-content\/uploads\/2019\/06\/image3-1-300x136.png 300w, https:\/\/www.thecoderschool.com\/blog\/wp-content\/uploads\/2019\/06\/image3-1.png 448w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">**Covariance differs from <\/span><b>correlation<\/b><span style=\"font-weight: 400;\"> in that correlation describes how strongly two variables are related, while covariance indicates the extent to which two random variables change with one another. The values of covariance lie between -\u221e and \u221e, while the values of correlation lie between -1 and 1. Correlations can be obtained only when the data is standardized. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">Covariance matrix computation aims to investigate how the variables in the input dataset are related to one another. This is important because it helps detect redundant information that may come from a high correlation between two elements. We compute a covariance matrix to determine these correlations. The covariance matrix is an <\/span><span style=\"font-weight: 400;\">n<\/span><span style=\"font-weight: 400;\">n<\/span><span style=\"font-weight: 400;\"> matrix, where <\/span><span style=\"font-weight: 400;\">n<\/span><span style=\"font-weight: 400;\">is the number of dimensions, that has entries of all possible covariances within the dataset.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">A couple notes:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Cov(x,x)=Var(x)<\/span><span style=\"font-weight: 400;\">, or the <\/span><a href=\"http:\/\/mathworld.wolfram.com\/Variance.html\"><span style=\"font-weight: 400;\">variance<\/span><\/a><span style=\"font-weight: 400;\"> of the initial variable.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">The <\/span><span style=\"font-weight: 400;\">Cov()<\/span><span style=\"font-weight: 400;\">operator is commutative, so <\/span><span style=\"font-weight: 400;\">Cov(x,y)=Cov(y,x)<\/span><span style=\"font-weight: 400;\">, so the upper and lower triangular portions of the matrix are equal.<\/span><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">The covariance matrix is simply an organization that lists the correlations between all possible pairs of variables. The sign of the value of the covariance is what tells us about the correlations between elements. If the covariance is positive, then the two variables are directly correlated. If the covariance is negative, the relationship between the two variables is an inverse correlation.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The next step in PCA is actually identifying the <\/span><b>principal components<\/b><span style=\"font-weight: 400;\"> by computing the <\/span><b>eigenvectors and eigenvalues<\/b><span style=\"font-weight: 400;\"> of the covariance matrix. However many principal components are produced from the dataset should be equal to the amount of dimensions in the set. Principal components are \u201ccombinations\u201d or \u201cmixtures\u201d of the initial variables, and are constructed such that each of them are uncorrelated and as much information from the variability initial variables as possible is stored in the <\/span><b>first component<\/b><span style=\"font-weight: 400;\">, and the succeeding components account for the remaining information, as shown in the example plot below for an 8-dimensional dataset:<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-8312 aligncenter\" src=\"https:\/\/www.thecoderschool.com\/wp-content\/uploads\/2019\/06\/image5-1-300x170.png\" alt=\"\" width=\"300\" height=\"170\" srcset=\"https:\/\/www.thecoderschool.com\/blog\/wp-content\/uploads\/2019\/06\/image5-1-300x170.png 300w, https:\/\/www.thecoderschool.com\/blog\/wp-content\/uploads\/2019\/06\/image5-1-450x255.png 450w, https:\/\/www.thecoderschool.com\/blog\/wp-content\/uploads\/2019\/06\/image5-1.png 596w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">This form helps significantly with dimensionality reduction because it eliminates components with little to no information while still retaining the information that describes the key relationships within the data. Consider the dataset below:<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-8313 aligncenter\" src=\"https:\/\/www.thecoderschool.com\/wp-content\/uploads\/2019\/06\/image6-1-300x144.png\" alt=\"\" width=\"300\" height=\"144\" srcset=\"https:\/\/www.thecoderschool.com\/blog\/wp-content\/uploads\/2019\/06\/image6-1-300x144.png 300w, https:\/\/www.thecoderschool.com\/blog\/wp-content\/uploads\/2019\/06\/image6-1-450x216.png 450w, https:\/\/www.thecoderschool.com\/blog\/wp-content\/uploads\/2019\/06\/image6-1.png 538w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">The direction <\/span><i><span style=\"font-weight: 400;\">first principal component <\/span><\/i><span style=\"font-weight: 400;\">line represents the direction of the highest variability in the data. Since the variability is the largest in the first component, the information captured by the first component is also the largest. It\u2019s the line in which the projection of the points onto the line is the most spread out. This line maximizes the average of the squared distances from the projected points to the origin. The direction of the <\/span><i><span style=\"font-weight: 400;\">second principal component <\/span><\/i><span style=\"font-weight: 400;\">line should be <\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Orthogonality\"><span style=\"font-weight: 400;\">orthogonal<\/span><\/a><span style=\"font-weight: 400;\"> in order for the principal components to be completely uncorrelated. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">We continue to calculate principal components <\/span><i><span style=\"font-weight: 400;\">n<\/span><\/i><span style=\"font-weight: 400;\"> times, where <\/span><i><span style=\"font-weight: 400;\">n<\/span><\/i><span style=\"font-weight: 400;\"> is the original number of values in the dataset. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">Going back to eigenvectors and eigenvalues, here are a couple preliminary notes about eigenvectors and eigenvalues:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">Every eigenvector has its own corresponding eigenvalue.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">The number of eigenvectors and corresponding eigenvalues is equal to the number of dimensions\/variables in the data.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400;\">For a tutorial on how to calculate the eigenvalues and eigenvectors of a matrix: <\/span><a href=\"https:\/\/www.scss.tcd.ie\/~dahyotr\/CS1BA1\/SolutionEigen.pdf\"><span style=\"font-weight: 400;\">https:\/\/www.scss.tcd.ie\/~dahyotr\/CS1BA1\/SolutionEigen.pdf<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">The eigenvectors of the covariance matrix give the <\/span><i><span style=\"font-weight: 400;\">directions of the principal component axes<\/span><\/i><span style=\"font-weight: 400;\">, and the eigenvalues are coefficients for the eigenvectors, and give the scalar amount of variance within each PC. The PCs in order of significance can be obtained by ranking the eigenvalues for each eigenvector from highest to lowest. To get the percentages of the variance carried by each PC, divide each of eigenvalue by the sum of all eigenvalues.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Next, we have to determine whether we want to keep some of the lesser components (the ones with low eigenvalues). We form a matrix called the <\/span><b>feature vector<\/b><span style=\"font-weight: 400;\"> with the eigenvectors of the components we do keep. This demonstrates the concept of dimensionality reduction since we are subtracting from the initial amount of principal components we had, which is equal to the dimension of the original dataset. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">Lastly, we use our feature vector to restructure our dataset in a sense. We want to put our data in terms of the axes given by the principal components instead of the original axes. We can do this pretty easily by multiplying the <\/span><a href=\"https:\/\/chortle.ccsu.edu\/VectorLessons\/vmch13\/vmch13_14.html\"><span style=\"font-weight: 400;\">transpose<\/span><\/a><span style=\"font-weight: 400;\"> of the feature vector by the transpose of the original dataset.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-8314 aligncenter\" src=\"https:\/\/www.thecoderschool.com\/wp-content\/uploads\/2019\/06\/image7-1-300x53.png\" alt=\"\" width=\"300\" height=\"53\" srcset=\"https:\/\/www.thecoderschool.com\/blog\/wp-content\/uploads\/2019\/06\/image7-1-300x53.png 300w, https:\/\/www.thecoderschool.com\/blog\/wp-content\/uploads\/2019\/06\/image7-1.png 330w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>By Camille D., Age 17 This article will focus on a method data scientists and programmers use to make data easier to explore, visualize, and interpret data, called principal component analysis (PCA). The explanations in this article assume some background in linear algebra and statistics. PCA is based on dimensionality reduction: \u201cthe process of reducing &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/www.thecoderschool.com\/blog\/advanced-machine-learning-techniques-principal-component-analysis\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Advanced Machine Learning Techniques: Principal Component Analysis&#8221;<\/span><\/a><\/p>\n","protected":false},"author":6,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[69,65],"class_list":["post-8257","post","type-post","status-publish","format-standard","hentry","category-coder-blog","tag-student-showcase","tag-tips-tricks","entry"],"_links":{"self":[{"href":"https:\/\/www.thecoderschool.com\/blog\/wp-json\/wp\/v2\/posts\/8257","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.thecoderschool.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.thecoderschool.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.thecoderschool.com\/blog\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/www.thecoderschool.com\/blog\/wp-json\/wp\/v2\/comments?post=8257"}],"version-history":[{"count":3,"href":"https:\/\/www.thecoderschool.com\/blog\/wp-json\/wp\/v2\/posts\/8257\/revisions"}],"predecessor-version":[{"id":12204,"href":"https:\/\/www.thecoderschool.com\/blog\/wp-json\/wp\/v2\/posts\/8257\/revisions\/12204"}],"wp:attachment":[{"href":"https:\/\/www.thecoderschool.com\/blog\/wp-json\/wp\/v2\/media?parent=8257"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.thecoderschool.com\/blog\/wp-json\/wp\/v2\/categories?post=8257"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.thecoderschool.com\/blog\/wp-json\/wp\/v2\/tags?post=8257"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}