Reusing PCA for plotting in R

I’m currently working on a project where I’m using principal component analysis to scale lots of variables into 2 dimensions for visualisation. Something like this:

PCA of Bin Packing Instance Features

 

This figure shows a number of features (numbers) relating to a few thousand bin packing problem instances. Each point is one instance, coloured by the data set it originally came from.

The code to do this looks like the following, using prcomp to compute the principal components, and ggbiplot (and extension of ggplot) to plot the points in 2D.


allbp.pca<-prcomp(as.formula(formulaRHS), allbp,center=TRUE,scale=TRUE)

g <- ggbiplot(allbp.pca, obs.scale = 1, var.scale = 1,
groups = allbp.species, ellipse = FALSE,
circle = TRUE, var.axes=TRUE)

g<-g+geom_point(aes(colour = allbp.species))
g<-g+theme_bw()
ggsave(filename=paste(plotpath,"PCA-",i,".pdf",sep=""), plot=g, width=16,height=8,dpi=300)

I’ve been regenerating these figures a lot, for example, running through the above data and colouring all points grey except those from a particular set. In order for this to work, I needed to sort the data and recompute the PCA each time for passing to ggbiplot. This is slow, and also results in the points occasionally being rotated an extra 90/180/270 degrees for some reason, making the figures harder to compare.

Now I’m using the “predict” function to rotate all the points after sorting, using the same prcomp object each time. This also means I use regular ggplot:


allbp.allprogs.pca <- prcomp(as.formula(formulaRHS), allbp.allprogs,center=TRUE,scale=TRUE)

... # sorting here, e.g.:

allbp.allprogs<-allbp.allprogs[order(allbp.allprogs[i]),]

localbp.species<-as.factor(allbp.allprogs[,i])

localbp.pca<-predict(allbp.allprogs.pca, newdata=allbp.allprogs) # instead of rebuilding PCA every time, use predict to rotate the points

localbp.df<-data.frame(x=localbp.pca[,1],y=localbp.pca[,2]) # get the first two principle components

g <- ggplot(localbp.df,aes(x,y,colour=localbp.species))+geom_point()

g <- g + theme_bw()

g <- g+scale_colour_manual(name="", values=c("1"="black", "0"="grey"))

 

Leave a Comment

Your email address will not be published.