Semantic / NLP Hackathon
When
- Date: June 8 & 9 (after the conference that takes place on June 6 & 7)
- Time: Let us start at 10am
- Location: Neofonie GMBH in Berlin: https://www.neofonie.de/standorte
- See also map
What to bring
- Your Berlin Buzzwords badge to be admitted to the Hackathon.
- Enthusiasm,
- Creativity and expertise
- Your laptop (or other hardware you need) preloaded with your favourite programming tools.
What
The general workshop topic is about R&D European projects that involve data intensive & semantic technologies (e.g. IKS, DICODE and related open source projects such as Stanbol, OpenNLP, UIMA, Mahout).
To be more concrete here is a task proposal that seems to have gained some traction among at least some members of the Stanbol and OpenNLP developer communities:
Task title: Open-data corpora and open-source tools to train statistical models for NLP and knowledge extraction from text
Incubating Apache projects such as Stanbol and OpenNLP need to train statistical models on annotated corpora for instance for Named Entity Recognition. Presently available models were mostly built on copyrighted corpora typically coming from the Linguistic Data Consortium (LDC) that prevent those projects to improve, modify, extend and re-distribute the existing annotated corpora to build and distribute user adapted statistical models.
We propose to take the opportunity of the Berlin Buzzword meeting to organize an hackathon to kick-start an effort to build our own annotated training corpus from free to redistribute sources such as Wikipedia, Wikinews, DBpedia, Gutenberg... while collaborating with other interested projects such as OpenNLP and UIMA.
Several developers from OpenNLP, UIMA and Stanbol already expressed interest in attending such a workshop. A practical goal could be to develop some UI tools to manually refine / correct / complete tokenized and semi-annotated NER corpus automatically extracted from Wikipedia / DBpedia using pignlproc. We could base such a work on existing projects such as wordfreak, the UIMA CasEditor or start a new web based UI for instance.
We could also extend the topic of the hackathon to improve or tools to build, package and distribute a Solr based index of entities and topics from various sources such as DBpedia and geonames with ranking scores based on popularity metrics. For instance one could use graph centrality metrics from the link structure of Wikipedia articles and Apache Mahout's Lanczos SVD of the adjacency matrix to compute the eigen-centrality scores for each DBpedia entity and SKOS topic.
Registration
If you would like to participate please add your-self below. Please also mention which particular aspect of the topic you would like to work on and links to existing relevant open source project you contribute to and that might be related to the topic.
Name | Bio | |
---|---|---|
Olivier Grisel | ogrisel@apache.org | Author of pignlproc and contributor to Stanbol |
Rupert Westenthaler | rupert.westenthaler@gmail.com | contributor to Stanbol |
Hannes Korte | hannes.korte@iais.fraunhofer.de | generally interested in open source NLP applications |
Daniel Streiff | daniel.streiff@htwchur.ch | interested in LOD, semantic graphs and NER |
Szabolcs Grünwald | szaby.gruenwald@gmail.com | interested in semantic annotation and search UI solutions |
Eduardo Torres | eduardo.torres-schumann@vico-research.com | interested in NLP for social media data and semantic topic modelling |
Julien Nioche | julien@digitalpebble.com | Nutch, Tika, GORA committer; contributor to UIMA and GATE; main author of Behemoth |
Doris Maassen | doris@neofonie.de | Research Manager, currently working for Dicode |
Martin Gerlach | martin.gerlach@neofonie.de | interested in LOD, Data aggregation, NER and NE disambiguation based on semantic graphs, duplicate detection and merging |
Anna Głazek | anna.glazek@nk.pl | NLP for Polish language |
Joseph Turian | joseph@metaoptimize.com | Machine Learning / NLP |
Riko Tertsch | dreamonspammmers@gmail.com | Machine Learning, NLP, Semantic Search, BigData |
Name | Your short bio and insterests go here | |
Name | Your short bio and insterests go here | |
Name | Your short bio and insterests go here |
Login or register to be able to edit this wiki page and add your-self here. If your account does not get activated within 24h, please contact isabel@apache.org
- Login to post comments