Entrar | Contactos | Dicionário | FLiP.pt | LegiX.pt | Blogue | Loja

April 13th - David Batista

David Batista (XLDB, FCUL)

Extracting geographic entities with Conditional Random Fields

Abstract:

Geographic Information Retrieval systems rely on the identification of place names in documents to determine the region about which they are relevant. Extracting location names from text is a common Natural Language Processing task, a simple approach is to used manually coded rules supported with dictionaries of place names or gazetteers. Despite these methods achieving good results, the rules are usually too restrictive and very specific in regard to a type of text.

Another approach is to use machine learning, based on extracting features from texts where the geographic entities are annotated. Features can be surrounding words or properties of the word itself, like capitalization, or frequency of the word in corpus. A probabilistic model is then built based on these features to discriminate when a given word is or not a geographic entity.

Work done on training and using Conditional Random Fields for extracting geographic references from a web crawl of the Portuguese web will be presented, and also available resources for research, such as a geographic ontology of Portugal.

--

Bio: David has an MSc. Informatics Engineering from the Faculty of Sciences, University of Lisbon, he is part of the XLDB group at LaSIGE. Currently he is working on GREASE (Geographic Reasoning for Search Engines) project, which researches information access methods to large collections of documents having geographically rich text and meta-data, with emphasis on the web.

 
March 9th - Noah Smith
March 23rd - Nuno Brás
March 30th - Shadab Khan
April 13th - David Batista
April 29th - Ruben Martinez-Cantin
May 14th - Xavier Anguera Miro
May 25th - Francisco Melo
June 8th - Matthijs Spaan
June 22nd - João Graça
July 2nd - Ricardo Vigário
November 2nd - Andras Hartmann
November 16th - Rui Guerreiro
November 30th - Gopala Anumanchipalli
December 14th - Mário Figueiredo
January 18th - Ivan Selesnick
February 2nd - Mariana Almeida
February 14th - Sara Silva
March 1st - Artur Ferreira
March 15th - Jorge Marques
March 29th - André Lourenço
April 4th - Kalyanmoy Deb
May 3rd - André Martins
May 17th - José Santos
May 31th - João Graça

Instituto Superior Técnico


Priberam.pt