Entrar | Contactos | Dicionário | FLiP.pt | LegiX.pt | Blogue | Loja

March 1st - Artur Ferreira

Artur Ferreira (IT/ISEL)

Unsupervised feature discretization and selection for sparse data

Abstract:

In many applications, we deal with high dimensional datasets with sparse data (many features have zero value with high probability). For instance, in text classification and information retrieval problems, we have large collections of documents. Each text is usually represented by a bag-of-words or similar representation, with a large number of features (terms). Many of these features may be irrelevant (or even detrimental) for the learning tasks. This excessive number of features carries the problem of memory usage in order to represent and deal with these collections, clearly showing the need for adequate methods for feature representation, reduction, and selection, to both improve the classification accuracy and the memory requirements for the storage of these datasets.

This talk focuses on techniques for unsupervised Feature Discretization (FD) and Feature Selection (FS). The proposed FD technique uses the Lloyd-Max algorithm along with a new criterion for FS based on the discretized features. The FS methods rely on the use of dispersion measures to compute feature relevance. The recent topic of compressed learning (CL), i.e., learning in a domain of reduced dimensionality obtained by random projections (RP) is explored under the framework of feature reduction. We show some experimental results on standard datasets.

--

Bio: Artur Ferreira is adjunct professor at ISEL (Instituto Superior de Engenharia de Lisboa) and a PhD student of Electrical and Computer Engineering at IST-IT (Instituto Superior Técnico – Instituto de Telecomunicações), under the supervision of prof. Mário Figueiredo. He holds a MSc on Electrical and Computer Engineering by IST. His main research interests are data compression, pattern recognition and machine learning.

 

 
March 9th - Noah Smith
March 23rd - Nuno Brás
March 30th - Shadab Khan
April 13th - David Batista
April 29th - Ruben Martinez-Cantin
May 14th - Xavier Anguera Miro
May 25th - Francisco Melo
June 8th - Matthijs Spaan
June 22nd - João Graça
July 2nd - Ricardo Vigário
November 2nd - Andras Hartmann
November 16th - Rui Guerreiro
November 30th - Gopala Anumanchipalli
December 14th - Mário Figueiredo
January 18th - Ivan Selesnick
February 2nd - Mariana Almeida
February 14th - Sara Silva
March 1st - Artur Ferreira
March 15th - Jorge Marques
March 29th - André Lourenço
April 4th - Kalyanmoy Deb
May 3rd - André Martins
May 17th - José Santos
May 31th - João Graça

Instituto Superior Técnico


Priberam.pt