Science · Code · Curiosity
Regression finder
Introduction
I've been working my way through Stanford's online Machine Learning course recently and I thought I should put some of what I've learnt to use.
The program
The program I made reads a file of tab-delimited data, assuming the first column is the independent values (x) and the second is the dependent values (y). It then gives some options:
- Simple summary
- Plot data
- Find linear regression
- Find polynomial fit
- Exit
Simple summary currently just gives the mean of x and y, but I'll probably add standard deviation and maybe some other simple statistics.
Plot data does just that using matplotlib. I might changes this to use an SVG graph drawer I've been working on, but I wanted to try out matlibplot.
The function "find linear regression" initially worked by explicitly using the normal equation:
p = (X.T * X).I * X.T * y
But then I found, unsurprisingly, there is a function to do this in the Python linear algebra library, which also returns extra information:
from numpy.linalg import lstsq
(p, residuals, rank, s) = lstsq(X, y)
Similarly, I was going to write my own function to calculate a polynomial fit of a given degree, but it makes much more sense to use the function already there, namely polyfit:
p = numpy.polyfit(x, y, degree)
Most of my code is to display the resulting vector is a nice way.
There's still lots to add, including working with multivariate data and adding regularisation, but I hope it will already be a useful program.
Comments 1
Leave a comment
Comments are moderated and will appear after approval.
Hi, like it so far, i'm curious to see what it will look like when you've added to it. I'm working on a program to find interesting correlations from aparently random data (still a bit of a newb though)
Update me with your progress, if you like
Cheers
Pedro