My research group has an opening for one undergraduate to work on a
Machine Learning project, starting this fall. The specific area is
Non-linear dimension reduction/Manifold learning (NLD). The goal for
this project is
(1) *efficient* implementation of NLD algorithms in python. The
current implementations run on thousands of data points (Matlab), 1
million (python). Can you rewrite them to run on 100M? on 1B?
(2) study real world data sets and discover their features, using
the algorithms you implement
- spectra of galaxies from large sky surveys
- the benchmark image data sets CIFAR-10 and CIFAR-100
www.cs.toronto.edu/~kriz/cifar.html
- recordings of brain activity
The software will ultimately (possibly as soon as the end of the
fall quarter) become a component of scikit-learn.
Requirements. To participate, you MUST:
- be a an expert with cython, numpy and other python scientific
computing libraries (send me the name of a github repository with
code by you, or equivalent proof of expertise when you apply)
Highly desirable (you will gain more from the experience)
- basic notions of probability, statistics and mathematics
- a course in algorithms and data structures
- a curious mind
Rewards for you:
- experience with modern machine learning
- experience with the statistical study of large real data sets
- co-authorship of the package
- 2-4 credit hours
[- depending on your dilligence: co-authorship of research papers
resulting from this project]
What if you are interested but are not a python expert? I cannot work
with you until the python project is underway. But if I do find a
person for this first priority project, then I may have 1-2 openings
in the same area. So, drop me a line.
______________________________________________________________________
,_ o Marina Meila Dept of Statistics Padelford B - 321
/ //\ Associate Professor U of Washington Box 354322
__\>>_|__ mmp@stat.washington.edu Seattle WA 98195-4322
\\, www.stat.washington.edu/mmp phone: 206-543-8484