This course requires a substantive project. Especially in the case of graduate students, I strongly prefer research oriented projects. By this I mean projects which support some research effort, whether it is your own, the vision lab's, or someone else's.
I also prefer that undergraduate projects are also research oriented, but in this case there will be more scope for other options. Undergraduate projects do not need to be as substantive as graduate one.
Below I have begun a list of project suggestions. I will update this list frequently during the first week of February. Some are marked as more suitable for undergraduate students and some are more suitable for graduate students or undergraduate students who wish to commit significant time to the project.
It is not necessary to choose a project from the list. They are only suggestions.
All project plans need to be finalized with me.
Group projects are fine in general, but will be considered in the context of specific projects and in the context of the plan for distributing the work.
Because the projects are meant to be research oriented, it will be assumed that others have your blessing to use what you have developed once you are done with it. If, for any reason, this is not the case, then you need to let me know in advance. This will exclude you from doing many (if not all) the suggested research projects, as these are mostly parts of larger projects. Thus, if you have any reservations about releasing your work, you should work with me to find a suitable project. Other than restricting the choice of projects somewhat, there is no consequence for choosing (in advance) not to release your work, and you do not need to give a reason for your choice.
Scope: I am willing to consider a wide range of projects related to vision including some which involve graphics, bio-informatics, and image data bases.
Publication potential: I have noted for some projects a possible publication target. This should not be taken to imply that those projects are the only ones that could lead to a paper. Most, if not all, have that potential. However, publication likely requires someone (possibly you) continues the work after the course is over (this is not required!).
Feb 17: By this time you have discussed your project with me, at least by E-mail. You have a good idea of what you want to do. If you have chosen a project that could be held up due circumstances beyond your control (e.g. for some great projects we are still sorting out the data), then you will have contemplated a second choice.
Feb 24: By this time you have chosen your partners and a project. In anticipation of the formal project proposal due the next week, you will send me a draft proposal.
Mar 3: Presentation of project proposal in class (need not exceed 5 minutes). Formal project proposal is also due. Depending on what you are working on and how familiar I am with it, this can be short. You will, however, need to give some idea about what you plan to do for each of the reviews, as well as some specification of what each participant in group projects will be doing.
March 29: First project review due.
What you need to provide is arguably project dependent. However, I suggest approaching this in the following way. Your project has an implementation component and a written component. The written compoenent could be viewed as a draft of a paper. The project reviews could be thought of as interations on a draft for the written part. Your proposal was the first draft. On each iteration, you go from writing what you were going to do, to writing what you have done. For many projects, images are appropriate, and are an exellent way to help tell the story.April 13: Second project review.I have made turnin keys cs477_pr1 and cs477_pr2 for project review materials. Formats: PDF is fine, or HTML, or even text. Images are good, either in the PDF document, pointed to by the HTML, or just in the directory, but referred to in your text.
It may help you to think what the purpose of the reviews are. First, they should convince me that you are on track. Second, they should help you clarify your thinking. Third, for some of you, some of what you write can form early drafts on parts of papers, where you can get feedback when it is most effective (early one). So try to make it a useful exercise instead of yet another hoop to jump through.
See first project review for details. The second project review is like the first one except that there should be even more substance.
May 13, 11 (if necessary), week of the 4 (by special request): Project demos/presentations:
I have identified two possible slots for project demos/presentations: May 11 in the afternoon, and May 13, after the exam (i.e. around 4:15 onwards). The slot after the exam worked fine last term for the graphics class--we followed it by a session in the pub on the instructor. However, we can spread it over two time slots, or use May 11 only if that is what everyone prefers. We will negotiate the final distribution over the time slots closer to the time.If someone wants to present their project during the week of the 4th, this is possible by special request.
May 13: Final code and write-ups due (hard deadline) .
Tracking people in the dark (could support several sub-projects)
(Ongoing vision lab work. Kevin Wampler is in charge)
For virtual reality applications we need to have camera systems which know the configuration of the people in the space (e.g. which way you are looking, what direction you are pointing) from camera data. For artistic VR, we need to do this in the dark. We are exploring 2 ways to do this. First, use 4 infrared cameras and 4 infrared sources. Second, work with the light from the VR screen (you know what it is--this may help). Both approaches could be combined. In addition to working out the geometry, models of humans have to be fit to the data. Finally the temporal change of the configuration--gesture recognition should be attempted. There are a number of ways that this project could be simplified and/or broken up.
Participants in this project should be prepared to join regular meetings on this topic.
Hong Hua (faculty in optical sciences) also does cool VR stuff. She focuses on collaborative environments and combining virtual reality with physical objects. She also would like to use camera data to get information about what is going on in the (real) VR space. She will present at the vision group meeting February 3. Some additional specifications are available as a word file.
Learning color semantics (WordNet Color Project) (single person undergrad project)
(Currently under consideration by Abin)
An undergraduate student (TszYan Chow) is working with faculty member Sandiway Fong on understanding color names as it relates to the WordNet ontology. She is linking WordNet words and images found with the Google API to try to understand the semantics of color words. The idea is to help them ground color word meanings, and automatically determine the linking between standard color space data (such as color histograms), and the color words in these images. Your role would be to join this project, help with the image processing, and in general, deal with the vision side of the equation as needed.
Manipulating soft (DXA) X-ray data and linking it with MRI data
(This project requires confirmation on data availability which should come very soon)
This project is the exploratory analysis for a big and very important project. Depending on what aspects are undertaken, several people could work on this project. Even with several people working on it, completion of the entire project is not likely to be possible. I will describe the general thrust of the project.
The main idea is to use DXA data to measure muscle mass loss in elderly women. The strategy is to warp/align DXA data to MRI data (the standard), and then build a classifier or a statistical model so that muscle mass can be estimated using inexpensive/portable DXA imaging instead of expensive MRI imaging.
The first concrete task likely will be creative hacking to reverse engineer the data format produced by the DXA machine (and/or dealing with ancient code and docs). Having done that, image processing could be applied for a variety of tasks such as identifying the skeleton. Then we will want to align multiple DXA images (same person different times) as best as possible so differences can be tracked over time. Possibly having done that, the data needs to be warped/aligned/mapped to the MRI data. Then a classifier on the DXA data needs to be trained using the MRI data.
Understanding shadow statistics (a color project)
This project is ambitious for one term. Again I will lay out the whole project from which a significant start could be planned. If sufficient progress was made by late March, a paper could be submitted to the color imaging conference which has a deadline of April 6.
Have a look at this paper and notice that it has lots of nice ideas but is also one hack after another. The goal of this project is to start to deal with the hacks.
The next part is to further develop methods to estimate the probability that an image boundary is an illumination boundary using the data collected as training data. (A literature search here is a good idea).
The next part is to fit the illumination field. Instead of least squares, a sampling approach seems to be appropriate. Several references would have to be read, and possibly implemented to compare against.
The final part is to propose new models for the task. This paper might be relevant, as are several others.
Vision software for the Aerial Robotics Project (slightly geared towards undergrad, but I can be flexible)
The UA aerial robotics project needs your help! (I will skip information about the project itself due to the fact that it was presented in class). I will also get some a more detailed picture on the status of the various components soon. A teaser of several project possibilities:
Nikhil Shirahatti has worked on this already, and can give some pointers and share some code.
Identifying the difference between an open and closed window. This may take some creativity. Does infrared light help? Polarization? Is there enough reflected light? Can you arrange reflected light. Can a range sensor be made to work (the window is quite far away)?
The group has made good progress identifying building structure from multiple images. Likely more work is needed (I will check).
Non-linear manifolds for image spaces (likely one grad, keen u-grad, or two u-grads, but I could be flexible)
There is much high dimensional data in computer vision. Since the data is not random, it typically lives in a lower dimensional manifold. Often it is assumed that this manifold is linear. However, this assumption is typically wrong, and leads to manifolds which are much too big (in terms of the numbers of dimensions). In this project you will apply methods for finding non-linear methods to interesting data.
The first task is to get the Matlab code on the web for
IsoMap
and
LLE
(start with
IsoMap). (You may need a login/passwd to get these ("me"/"read4fun").
Implement a C version of both for the vision group software library. Now apply
it to some fun problems starting with laying out images for browsers based on
data suppled by Kobus, and possibly other measures limited only by your
imagination. We have browsing software which uses the layout matrix.
Other data of interest is FACS data from the FACS project (assuming that someone
does it).
Automatically determining FACS data
A number of groups have published algorithms (and code) for taking human face
data and computing the FACS representation of it (with the goal of recognizing
expression). The goal of this project is to get a FACS system up and running at
UA. Begin by reading some papers and downloading software. See what works. See
what needs to be improved. See what needs to be sped up, modularized, etc.
Unless an existing system is perfect (only in your dreams), we will need to
understand it and its limitations well enough to build on it. It may be
necessary to implement some promising algorithms for which there is no code (or
unusable code).
Data: I think that there is plenty of test data on the web. However, you can, and
perhaps should, take some pictures yourself using our cameras (currently on
order, but should arrive soon).
Shape representation
(likely challenging, possibly more then one person could
work on this independently, taking different approaches)
Understanding 2D shape is one of the big open problems in computer vision. This
project is very exploratory. However, there is some possibility of publication
next fall.
Get some shape data of common objects (lots of examples of lots of different
ones). I intend to order the
Hemera
set of masked images. They may come with labels "zebra"; you may have to reverse
engineer an indexed database to get the labels; you may even have to label some.
Alternatively, you may be able to find/suggest other sources and/or manually
extract shapes and/or label them. One source worth considering is this one
.
Having got multiple examples of a bunch of interesting shapes (e.g., 50 zebras,
30 tigers, 40 dogs, 25 houses), experiment with a variety of shape
descriptors and build a classifier for them based on shape.
Perhaps
something like
shape context
is worth trying. (You may need a login/passwd to get this paper ("me"/"read4fun")).
There are many other possibilities. In the long run the goal is to understand
the problem well enough to build a new descriptor/classifier combination which
works better then existing ones.
A literature search for both shape descriptors, and anyone recently taking the
approach suggested here, is a must.
Texture mapping outdoor photos
The goal of this project is to improve the realism in visualizing a
landscape. We have the 3D terrain model given by DEM (Digital Elevation
Model) data, but we are lacking a detailed texture map to paint it with.
The idea is to take digital photos of mountains (not taken in any specific
way, just as people hike or ride around), and figure out how to texture
map them on the terrain model. Assume that the exact position of the
camera for each photo is known. We do not know the direction the photo
was taken in, however.
A first round idea is to set a height for the camera (~5 feet), and spin
the view 360 degrees, looking at the resulting skyline from the terrain
model. Matching this skyline to the skyline extracted from the
image would give us direction the photo was taken in, and some first round
approximation of texture coordinates could be extracted.
Scott and Alan Morris have code for constructing and visualizing 3D
terrain models, as well as for determining the latitude/longitude of
digital photos (this is determined by looking at the time a photo was
taken and relating it to a GPS tracklog of a hiker/biker's trip). So,
much of the required infrastructure for this project is available.
Blind search for interesting features
The purpose of this project is to treat image patches or simple function of them
as features, and look for useful correlations (or high mutual information)
between the responses to those features and words associated with the images. So
presumably we would automatically discover that a blue patch is a good feature
for sky and an orange stripy patch is a good feature for tigers. Things become a
little more complex because there are many slightly different blue patches. Once
we choose one, the patch that gives most additional information about sky is
likely to be quite different. (Think orthogonal basis, and note relation
to clustering mentioned below).
Since we are looking for correlations based on images, and the feature detector
responds to a part of an image, we would want to experiment with combining the
responses. For example, likely the first thing to try would be simply taking the
maximum, but there are other possibilities.
Doing a blind search is, of course, computationally intense. There is no hope of
trying them all. The first task would be to implement fast detection (one method
is done or nearly done, a second method that needs to be tried is not overly
difficult, and it would be useful in general to know how they compared in terms
of speed). Also, it is likely best to first find a clustering of patches first.
Once things are in place, we would then keep a cluster of CPU's warm finding
some interesting features.
Of course, doing better then blind search is of interest, and likely with a bit
of thought, we could do something more clever. At which point we would have to
change the project title.
Illumination models for graphics
This is essentially a graphics project with some vision added. It should only be
attempted by someone quite experienced in graphics in order that it does not
become a true graphics project. The point of the project would be to exploit
your skill at graphics to do something interesting with illumination models
based on measured data (which exists). In particular, I would like to see
fluorescent surfaces handled in a specific way (come see me for details).
Working the radiosity equations in this paradigm would be very interesting.
There has been very limited work on fluorescent surfaces in graphics, and a
effectively including them into a renderer would be a great project.
If sufficient progress was made by late March, a paper could be submitted to the
color imaging conference which has a deadline of April 6.
Measuring color constancy at a pixel
Implement this
paper
and look at the variation of the solution over the pixels of an image. Average
the results over the image as a way to do color constancy and integrate this
method into this existing color constancy algorithm
tester.
Of interest is how this and other algorithms behave as the size of the image patch given to them changes.
Classifying normal verses non-normal neurons
(Under consideration by Kael)
This
directory
contains two subdirectories with images from neurons and their respective
skeletonizations. (It is not clear whether there is any mileage in improving the
skeletonizations). The neurons form two groups (either a "c" in the filename or
a "d"). The difference is related to the amount of curvature at the endings.
(We will find out more about this). The object is to write a classification
program. On the assumption that the curvature of the endings is an issue, a
start would be to identify the endings, color them red (to verify), compute the
average curvature, and see if this can be used to classify the two groups.
Using neural networks for word prediction (challenging!)
This project requires previous experience with neural networks or a keen desire to put extra time into learning about them---the instructor is neither a neural network expert nor an advocate for them, and has no idea if it can be made to work. A sufficiently detailed and intellegent report of failure would be acceptable.
Consider data of the form:
Document 1: continous feature vector 1 continous feature vector 2 .... continous feature vector n 1 discrete binary vector (a sequence of yes and no's) Document 2: continous feature vector 1 continous feature vector 2 .... continous feature vector n 2 discrete binary vector (a sequence of yes and no's) .............. Document M: continous feature vector 1 continous feature vector 2 .... continous feature vector n M discrete binary vector (a sequence of yes and no's)The number of feature vectors is variable, but they could be forced to be the same in the intial project phases to simplify things. The project is to construct a neural network to predict the binary vector data from (a) the continous feature vectors, and (b), a single continous feature vector.
To be more concrete, and explain why this relates to vision, we can consider the continuous feature vectors as representing image regions, and the binary vectors representing words. So this neural network would be able to predict the word tiger from an image with an orange stripey bit.
Note that this is not an easy project. A foray into the literature is a must. It is quite likely that a neural network like this has never been constructed. Normally, a neural network is given a single feature vector, and a simple output. It is possible to put the above data into that form, by simply conjecturing that each a feature vector should be associated with each word (i.e. hedging your bets or distrubuting the uncertaintly), and hope that the neural network learns the right thing. This could be a worthwhile warm-up exersize whose results can form a naive baseline, but it does not really get to the heart of the problem. If you were then to weight this data based on the neural network output, and iterate this process, then you would the algorithm in this paper , except, instead of a SVM you have a neural network. Since this has already be done, the focuss on this project is to think of the kinds of representations that will allow a neural network to learn the essence of the above data in a novel way. In particular, the network likely should represent the fact that the words are predicted through some group of regions. If that group of regions is taken to be the entire set (the whole image), then the network would overtrain. As is always the case, it is critical to work towards good performance on data not used for training.
To do this project, you will have to convince me that you have a solid understanding why this is a difficult project, and is not a standard neurel network project.