SIGHAN06中有一篇paper, 关于Alias-i公司的Bob Carpenter所提交的参评报告”Character Language Models for Chinese Word Segmentation and Named Entity Recognition”看到了他们开发的LingPipe NLP Toolkit,一个自然语言处理的Java开源工具包。可以免费下载,而且开源,支持中文,不仅仅是对代码结构的说明,而且还提供了算法思想文档和相关的资源,如测试数据集、相关论文等,一个不错的toolkit。

地址:http:/alias-i.com/lingpipe/

包括的模块:

主题分类(Top Classification)、命名实体识别(Named Entity Recognition)、词性标注(Part-of Speech Tagging)、句题检测(Sentence Detection)、查询拼写检查(Query Spell Checking)、兴趣短语检测(Interseting Phrase Detection)、聚类(Clustering)、字符语言建模(Character Language Modeling)、医学文献下载/解析/索引(MEDLINE Download, Parsing and Indexing)、数据库文本挖掘(Database Text Mining)、中文分词(Chinese Word Segmentation)、情感分析(Sentiment Analysis)、语言辨别(Language Identification)等

Feature Overview

LingPipe’s information extraction and data mining tools:

  • track mentions of entities (e.g. people or proteins);
  • link entity mentions to database entries;
  • uncover relations between entities and actions;
  • classify text passages by language, character encoding, genre, topic, or sentiment;
  • correct spelling with respect to a text collection;
  • cluster documents by implicit topic and discover significant trends over time; and
  • provide part-of-speech tagging and phrase chunking.

 Leave a Reply

(required)

(required)


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

使用腾讯微博登陆

Protected by WP Anti Spam
 
© 2011 Information Retrieval Blog Suffusion theme by Sayontan Sinha