Room: B230 Engineering Quadrangle
Major advances in information technologies must arise from close collaboration between application, hardware/architecture, algorithm, CAD, and system design. It is critical to consider, for instance, how to deploy in concert an ever-increasing number of transistors with acceptable power consumption, and how to make hardware user-friendly yet effective for applications. Over the past 30 years, my research group has studied the development of signal processing systems. Our research findings have played direct and indirect roles in the construction of a number of useful signal processing systems. In this dynamic discipline, however, emerging applications always dictate new system metrics on power consumption, speed, and performance. This inspires novel intellectual challenges on system designs. Two vital applications that have emerged in recent years are multimedia and genomics. They form an important basis of a new paradigm of information processing that relies heavily on intelligent search/processing.
Multimedia technologies will profoundly change the way we access information. They will also provide new challenges for machine learning research. We investigate various issues relevant to intelligent multimedia communication application, and develop machine learning tools for various adaptive and content-based technologies for MPEG-4 applications such as compression, indexing, and retrieval of visual information. This work has natural and promising extensions to internet search engines, document analysis, and biometric authentication.
The field of bioinformatics represents a natural convergence of life sciences and information technologies. Modern large-scale DNA devices such as microarrays have rapidly and steadily generated enormous amounts of genomic data. We study advanced machine learning tools that could reveal salient information embedded in genomic data and facilitate classification and prediction of tumors and their responses to drug therapies.
From the application perspective, machine learning is both effective and instrumental for distilling useful information embedded in the wealth of the available data. We study tailor-designed machine learning tools for specific applications including feature extraction, clustering, and classification. Machine learning tools can also be adopted to facilitate integration of data collected from experiments in different levels of biological systems or fusion of diversified multimedia data such as text, speech, image, video, and graphics.
In addition to application studies, we also investigate theoretical aspects of machine learning, based on the basic principle of learning by example. These theoretical foundations include statistical learning, optimization, and algebraic theory. In the past half century, machine learning techniques have evolved from simple linear classifiers to neural networks and, recently, to kernel-based approaches. The promise of the kernel approach hinges upon its new representation vector space, leading to a divergent data structure. It also theoretically assures the linear separability of training data in the reproducible kernel space. Moreover, it provides a unified treatment of heterogeneous genomic data, including vectors, sequences, and graphs.
We investigate both categories of kernel-based learning techniques:
1. Unsupervised learning for cluster discovery and graph partition. Kernel approaches extend the conventional K-means (designed for clustering vectors in Euclidean space) to any objects that can be characterized by pairwise relationship (sequences or graphs). The result may be applied to kernel-based (e.g., kernel K-means), and graph-based clustering algorithms. The latter have potential applications to genomics (e.g., interaction networks, metabolic networks, or signaling pathways) and multimedia (e.g., search engines and World Wide Web/social networks).
2. Supervised cluster discovery and supervised classification/prediction. Kernel approaches provide a unified framework between Fisher and support vector machine (SVM) classifiers. This ultimately leads to a unifying hybrid classifier, which includes Fisher’s discriminant analysis (FDA) and SVM as special cases. The unified classifier offers necessary flexibility for improving prediction performance.