Hacker Newsnew | past | comments | ask | show | jobs | submit | sforzando's commentslogin

Don't worry, this article is definitely comedy/parody.


I was impressed with the Labeled Faces in the Wild (LFW) facial auto-completion results, especially since the system was not trained on LFW at all! The results seemed almost too good to be true. Perhaps this is a testament that there isn't that much diversity in human faces?

Very well written overall, and I appreciated the author's thoughts on TensorFlow+torch at the end of the article.

Adversarial training is a fascinating idea, and I love the sound of it. I'd like to start applying that concept in the future.


An example of a kernel that has a mathematical explanation for its values is the (discrete) Gaussian kernel, used to blur/smooth images: http://dev.theomader.com/gaussian-kernel-calculator/

The values of the gaussian kernel matrix are determined by doing a discrete sampling of the gaussian function. You get to choose sigma (gaussian's standard deviation) and kernel size (spatial neighborhood of the kernel, ie how much of the surroundings that the kernel will examine).

Another example is the Sobel operator, used to extract edges from images: https://en.wikipedia.org/wiki/Sobel_operator

The kernel matrix is the result of composing a gaussian smoothing with a spatial-differencing operation. Thus, the Sobel estimates edges from smoothed images.

As for the sharpen kernel described in the post -- an intuitive explanation is that you want to accentuate differences in pixel intensities.


Could you explain how wavelet decompositions/transforms could be used to learn a predictive model? In other words: given a labeled dataset D = {x_i, y_i}, a function F(x) = y, where x is input data (pixels, credit scores, etc), and y are target labels (object labels, investment risk, etc.).

I'm not very well-versed with wavelet methods. But in computer vision and image processing, I've seen people apply wavelet transforms to images, extract the wavelet coefficients, and use the coefficients as the image feature representation. Then, these coefficients would typically be fed to a traditional machine learning classifier, ie nearest neighbor, SVM, etc.

In other words, I've seen wavelet transforms used as feature extractors. I haven't seen wavelet transforms used to actually learn the predictive model F(x).

Gradient boosting, on the other hand, is learning the predictive model F(x).

Said in another way: gradient boosting is learning F(x) = y.

Wavelet transforms learn g(x) = x^{hat}, such that F(g(x)) = y is "easier" to learn.

I hope I'm explaining things clearly - sorry in advance if I made any mistakes, particularly in my understanding of wavelet transforms/decompositions.


These helpful, well-written slides help explain where the "gradient" comes into "Gradient Boosting":

http://www.ccs.neu.edu/home/vip/teach/MLcourse/4_boosting/sl...

The gist of it is: when you add a new decision tree that fits to the residual error, this new tree is fitting to the negative gradient of the loss function (ie training error). Thus, adding the new decision tree to your existing ensemble takes a gradient-descent step that seeks to minimize the loss function (ie training error).

Boosting comes in because the model is combining several weak learners/models (individual trees) into a strong learner (ensemble of trees). Each individual tree breaks up the input space into piecewise-constant regions that best approximate the target function. This representation will incur some error - thus, a new tree is fit to minimize the error over the entire input space, ie by breaking up the input space into piecewise-constant regions, etc.

So, it's boosting not in the traditional Adaboost sense: where the final model is a linear combination of "dumb" classifiers. Instead, I'd liken it more to a cascade method: each tree T_{n} seeks to fix the errors from the previous tree T_{n-1}: https://en.wikipedia.org/wiki/Cascading_classifiers

There's actually a cool facial landmark detector that uses this same cascading idea to train an extremely fast (and quite accurate) system. In essence, they use a cascade of random forests (in a gradient-boosting framework) to detect landmarks. The dlib library has a great implementation, along with a pretrained model. I've used it in my research, and while not perfect, have been satisfied with its results: http://blog.dlib.net/2014/08/real-time-face-pose-estimation....

http://www.cv-foundation.org/openaccess/content_cvpr_2014/pa...


Those slides were very helpful, thank you.


I really enjoyed your step-by-step explanation of the pipeline: http://vipulsharma20.blogspot.in/2016/01/document-scanner-us...

Great job! Computer vision is a really exciting field, and there are many exciting things you can do with it.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: