Welcome & Normalization
- Derek Ferguson
- Jun 15, 2018
- 2 min read
Hi there! So, first a bit about me, and then the first topic I'd like to hit. I'm a reasonably successful technology executive - have written a few books ("Mobile .NET" and "Broadband Internet Access for Dummies" the most successful) and been gainfully employed for the past 20 years in software development. Having said that, I started coding as a kid in an era when it was viewed as more of a linguistic pursuit and less of a mathematical endeavor than today. A span of time as short as it was sweet.
One of the hottest topics around at the moment is Machine Learning. Like everyone else in my industry, it is a huge topic of interest for me. But, as I work my way through it, I'm finding this to be a distinctly less friendly area for the non-mathematically-inclined amongst us than other areas of our profession. So, as I figure out the "mathy" bits of all of this - I wanted to make sure to share the simplified understandings I reach with the rest of the community.
So, topic #1 - normalization. I'm working my way through a course on Tensorflow and the first thing we are told is that it is a good idea to normalize our data before using it, in order to avoid over and under flows. Fine and well but... no definition of normalization!
Looking up the definition of this elsewhere leads to many false definitions, but one that seems promising is the simple statement that normalizing data allows you to compare the relationship between data points that exist on vastly different scales. For example - the population of various cities vs. average heights of people.
The sample already provides a nice graph of the raw data - house prices vs. house sizes. The course author then runs a formula over both arrays (house prices and house sizes) that simply subtracts the mean from every value in the array and then takes the resulting array and divides every member of it by the new standard deviation. This doesn't match against my understanding of the textbook definition, because the code deals with both arrays separately... not using any data from either array as any kind of scaling effect against the other. So, I decide to take an experimental approach by looking at graphs of the data, both before and after normalization.
Before...

And after... (Ignore the missing axis names - that's not a part of the normalization - just me being lazy the second time around...)

So, it becomes clearer, and starts to make more intuitive sense - both axis now run from -2 to 2. Even though the subtraction and division operations against the mean and standard deviation were undertaken separately on either array - they ultimately have the same effect of reducing the data points to the ranges of -2 to 2 and, therefore, the data is all now in the same range. Very slick!
コメント