<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Information Retrieval Blog &#187; Toolkits</title>
	<atom:link href="http://blog.zye.me/tag/toolkits/feed" rel="self" type="application/rss+xml" />
	<link>http://blog.zye.me</link>
	<description>REAL TIME DATA PROCESSING, DISTRIBUTED COMPUTING, PATTERN DISCOVERY</description>
	<lastBuildDate>Wed, 08 Feb 2012 17:33:32 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Toolkits</title>
		<link>http://blog.zye.me/2011/06/55851.html</link>
		<comments>http://blog.zye.me/2011/06/55851.html#comments</comments>
		<pubDate>Thu, 02 Jun 2011 15:22:19 +0000</pubDate>
		<dc:creator>yezheng</dc:creator>
				<category><![CDATA[information Retrieval]]></category>
		<category><![CDATA[Toolkits]]></category>

		<guid isPermaLink="false">http://blog.so8848.com/2011/06/55851.html/</guid>
		<description><![CDATA[FlexCRFs: Flexible Conditional Random Fields CRFTagger: CRF English POS Chunker CRFChunker: CRF English Phrase Chunker JTextPro: A Java-based Text Processing Toolkit JWebPro: A Java-based Web Processing Toolkit JVnSegmenter: A Java-based Vietnamese Word Segmentation Tool &#160;]]></description>
			<content:encoded><![CDATA[<p>
<ul style="font-family: 'Times New Roman'; font-size: medium;">
<li>
<p style="margin-top: 0px; margin-bottom: 8px;"><span style="font-family: Arial;"><a href="http://flexcrfs.sourceforge.net/">FlexCRFs</a>: Flexible Conditional Random Fields</span></p>
</li>
<li>
<p style="margin-top: 0px; margin-bottom: 8px;"><span style="font-family: Arial;"><a href="http://crftagger.sourceforge.net/">CRFTagger</a>: CRF English POS Chunker</span></p>
</li>
<li>
<p style="margin-top: 0px; margin-bottom: 8px;"><span style="font-family: Arial;"><a href="http://crfchunker.sourceforge.net/">CRFChunker</a>: CRF English Phrase Chunker</span></p>
</li>
<li>
<p style="margin-top: 0px; margin-bottom: 8px;"><span style="font-family: Arial;"><a href="http://jtextpro.sourceforge.net/">JTextPro</a>: A Java-based Text Processing Toolkit</span></p>
</li>
<li>
<p style="margin-top: 0px; margin-bottom: 8px;"><span style="font-family: Arial;"><a href="http://jwebpro.sourceforge.net/">JWebPro</a>: A Java-based Web Processing Toolkit</span></p>
</li>
<li>
<p style="margin-top: 0px; margin-bottom: 8px;"><span style="font-family: Arial;"><a href="http://jvnsegmenter.sourceforge.net/">JVnSegmenter</a>: A Java-based Vietnamese Word Segmentation Tool</span></p>
</li>
</ul>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.zye.me/2011/06/55851.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>content based image retrieval (CBIR) toolkits and package</title>
		<link>http://blog.zye.me/2009/05/52060.html</link>
		<comments>http://blog.zye.me/2009/05/52060.html#comments</comments>
		<pubDate>Sat, 23 May 2009 01:13:07 +0000</pubDate>
		<dc:creator>yezheng</dc:creator>
				<category><![CDATA[information Retrieval]]></category>
		<category><![CDATA[content based image retrieval]]></category>
		<category><![CDATA[image information retrieval]]></category>
		<category><![CDATA[Toolkits]]></category>

		<guid isPermaLink="false">http://blog.so8848.com/?p=52060</guid>
		<description><![CDATA[present several tools for CBIR. Unfortunately, these tools are all lack of documents. I choose LIRE since I am familiar with lucene and Java. Do you have any other good choices? If you do have, please comment. Thanks. The LIRE (Lucene Image REtrieval) library (Java based) It is a CBIR system based on Lucene (Java-based) <a href='http://blog.zye.me/2009/05/52060.html'>[...]</a>]]></description>
			<content:encoded><![CDATA[<p>present several tools for CBIR. Unfortunately, these tools are all lack of documents. I choose LIRE since I am familiar with lucene and Java.</p>
<p>Do you have any other good choices? If you do have, please comment. Thanks.</p>
<h1>The <a href="http://www.semanticmetadata.net/lire/" target="_blank">LIRE </a>(Lucene Image REtrieval) library (Java based)</h1>
<p>It is a CBIR system based on Lucene <strong>(Java-based)</strong> and <a href="http://sourceforge.net/projects/caliph-emir/">Caliph and Emir.</a> Caliph-emir is a <strong>Java</strong> &amp; MPEG-7 based tools for annotation and retrieval of digital photos and images, supporting semantic annotation and content based, metadata based and semantic image retrieval.  Lucene is a full text retrieval package. It is easy to imagine that what jobs LIRE  can do according to its cores.</p>
<p>Also, it&#8217;s easy to incorporate context-based image retrieval into LIRE as Lucene  does the job naturally. But Lucene, I think,  is not suitable for academic research.  However,  at lease you can use Caliph-emir to extract the low level features of images, which could also be used in other IR systems.</p>
<p>LIRE can used out-of-the-box.</p>
<h1>The GNU Image-Finding Tool(<a href="http://www.gnu.org/software/gift/" target="_blank">GIFT</a>, C based)</h1>
<p>The GIFT (the GNU Image-Finding Tool) is a Content Based Image Retrieval     System (CBIRS: <a href="http://en.wikipedia.org/wiki/CBIR" target="_blank">http://en.wikipedia.org/wiki/CBIR</a>). It enables you to do Query     By Example (QBE: <a href="http://en.wikipedia.org/wiki/QBE" target="_blank">http://en.wikipedia.org/wiki/QBE</a>) on images, giving you the     opportunity to improve query results by relevance feedback. For processing     your queries the program relies entirely on the content of the images,     freeing you from the need to annotate all images before querying the     collection.</p>
<p>The GIFT comes with a tool which lets you index whole directory trees     containing images in one go. You then can use the GIFT server and its     <a href="http://www.gnu.org/software/gift/#clients">clients</a>, to browse your own image collections</p>
<h1><a href="http://www-i6.informatik.rwth-aachen.de/~deselaers/fire/" target="_self">FIRE</a> (CPP)</h1>
<p>FIRE is an image retrieval system developed as part of the diploma thesis of Thomas Deselaers. Later, large parts were rewritten to make it more easily maintainable.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.zye.me/2009/05/52060.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Toolkits for IR/NLP/ML</title>
		<link>http://blog.zye.me/2008/03/14666.html</link>
		<comments>http://blog.zye.me/2008/03/14666.html#comments</comments>
		<pubDate>Mon, 03 Mar 2008 05:18:08 +0000</pubDate>
		<dc:creator>yezheng</dc:creator>
				<category><![CDATA[information Retrieval]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[ML]]></category>
		<category><![CDATA[NLP]]></category>
		<category><![CDATA[Toolkits]]></category>
		<category><![CDATA[信息检索]]></category>

		<guid isPermaLink="false">http://www.5yiso.cn/2008/03/14666.html</guid>
		<description><![CDATA[以下工具绝大多数都是开源的，基于GPL、Apache等开源协议，使用时请仔细阅读各工具的license statement I. Information Retrieval 1. Lemur/Indri The Lemur Toolkit for Language Modeling and Information Retrieval http://www.lemurproject.org/ Indri: Lemur&#8217;s latest search engine 2. Lucene/Nutch Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. Lucene是apache的顶级开源项目，基于Apache 2.0协议，完全用java编写，具有perl, c/c++, dotNet等多个 port http://lucene.apache.org/ http://www.nutch.org/ 3. WGet GNU Wget is a free software package for <a href='http://blog.zye.me/2008/03/14666.html'>[...]</a>]]></description>
			<content:encoded><![CDATA[<p>以下工具绝大多数都是开源的，基于GPL、Apache等开源协议，使用时请仔细阅读各工具的license statement </p>
<p>I. Information Retrieval <br />1. Lemur/Indri <br />The Lemur Toolkit for Language Modeling and Information Retrieval <br /><a href="http://www.lemurproject.org/" target="_blank" rel="nofollow">http://www.lemurproject.org/</a> <br />Indri: <br />Lemur&#8217;s latest search engine </p>
<p>2. Lucene/Nutch <br />Apache Lucene is a high-performance, full-featured text search engine <br />library written entirely in Java. <br />Lucene是apache的顶级开源项目，基于Apache 2.0协议，完全用java编写，具有perl, c/c++, dotNet等多个 <br />port <br /><a href="http://lucene.apache.org/" target="_blank" rel="nofollow">http://lucene.apache.org/</a> <br /><a href="http://www.nutch.org/" target="_blank" rel="nofollow">http://www.nutch.org/</a> </p>
<p>3. WGet <br />GNU Wget is a free software package for retrieving files using HTTP, <br />HTTPS and FTP, the most widely-used Internet protocols. It is a non- <br />interactive commandline tool, so it may easily be called from scripts, <br />cron jobs, terminals without X-Windows support, etc. <br /><a href="http://www.gnu.org/software/wget/wget.html" target="_blank" rel="nofollow">http://www.gnu.org/software/wget/wget.html</a> </p>
<p>II. Natural Language Processing <br />1. EGYPT: A Statistical Machine Translation Toolkit <br /><a href="http://www.clsp.jhu.edu/ws99/projects/mt/" target="_blank" rel="nofollow">http://www.clsp.jhu.edu/ws99/projects/mt/</a> <br />包括GIZA等四个工具 </p>
<p>2. GIZA++ (Statistical Machine Translation) <br /><a href="http://www.fjoch.com/GIZA++.html" target="_blank" rel="nofollow">http://www.fjoch.com/GIZA++.html</a> <br />GIZA++ is an extension of the program GIZA (part of the SMT toolkit <br />EGYPT) which was developed by the Statistical Machine Translation team <br />during the summer workshop in 1999 at the Center for Language and <br />Speech Processing at Johns-Hopkins University (CLSP/JHU). GIZA++ <br />includes a lot of additional features. The extensions of GIZA++ were <br />designed and written by Franz Josef Och. <br />Franz Josef Och先后在德国Aachen大学，ISI(南加州大学信息科学研究所)和Google工作。GIZA++现已有 <br />Windows移植版本，对IBM 的model 1-5有很好支持。 </p>
<p>3. PHARAOH (Statistical Machine Translation) <br /><a href="http://www.isi.edu/licensed-sw/pharaoh/" target="_blank" rel="nofollow">http://www.isi.edu/licensed-sw/pharaoh/</a> <br />a beam search decoder for phrase-based statistical machine translation <br />models </p>
<p>4. OpenNLP: <br /><a href="http://opennlp.sourceforge.net/" target="_blank" rel="nofollow">http://opennlp.sourceforge.net/</a> <br />包括Maxent等20多个工具 </p>
<p>btw: 这些SMT的工具还都喜欢用埃及相关的名字命名，像什么GIZA、PHARAOH、Cairo等等。Och在ISI时开发了GIZA+ <br />+，PHARAOH也是由来自ISI的Philipp Koehn 开发的，关系还真是复杂啊 </p>
<p>5. MINIPAR by Dekang Lin (Univ. of Alberta, Canada) <br />MINIPAR is a broad-coverage parser for the English language. An <br />evaluation with the SUSANNE corpus shows that MINIPAR achieves about <br />88% precision and 80% recall with respect to dependency relationships. <br />MINIPAR is very efficient, on a Pentium II 300 with 128MB memory, it <br />parses about 300 words per second. <br />binary填一个表后可以免费下载 <br /><a href="http://www.cs.ualberta.ca/~lindek/minipar.htm" target="_blank" rel="nofollow">http://www.cs.ualberta.ca/~lindek/minipar.htm</a> </p>
<p>6. WordNet <br /><a href="http://wordnet.princeton.edu/" target="_blank" rel="nofollow">http://wordnet.princeton.edu/</a> <br />WordNet is an online lexical reference system whose design is inspired <br />by current psycholinguistic theories of human lexical memory. English <br />nouns, verbs, adjectives and adverbs are organized into synonym sets, <br />each representing one underlying lexical concept. Different relations <br />link the synonym sets. <br />WordNet was developed by the Cognitive Science Laboratory at Princeton <br />University under the direction of Professor George A. Miller <br />(Principal Investigator). <br />WordNet最新版本是2.1 (for Windows &amp; Unix-like OS)，提供bin, src和doc。 <br />WordNet的在线版本是<a href="http://wordnet.princeton.edu/perl/webwn" target="_blank" rel="nofollow">http://wordnet.princeton.edu/perl/webwn</a> </p>
<p>7. HowNet <br /><a href="http://www.keenage.com/" target="_blank" rel="nofollow">http://www.keenage.com/</a> <br />HowNet is an on-line common-sense knowledge base unveiling inter- <br />conceptual relations and inter-attribute relations of concepts as <br />connoting in lexicons of the Chinese and their English equivalents. <br />由CAS的Zhendong Dong &amp; Qiang Dong开发，是一个类似于WordNet的东东 </p>
<p>8. Statistical Language Modeling Toolkit <br /><a href="http://svr-www.eng.cam.ac.uk/~prc14/toolkit.html" target="_blank" rel="nofollow">http://svr-www.eng.cam.ac.uk/~prc14/toolkit.html</a> <br />The CMU-Cambridge Statistical Language Modeling toolkit is a suite of <br />UNIX software tools to facilitate the construction and testing of <br />statistical language models. </p>
<p>9. SRI Language Modeling Toolkit <br /><a href="http://www.speech.sri.com/projects/srilm/" target="_blank" rel="nofollow">www.speech.sri.com/projects/srilm/</a> <br />SRILM is a toolkit for building and applying statistical language <br />models (LMs), primarily for use in speech recognition, statistical <br />tagging and segmentation. It has been under development in the SRI <br />Speech Technology and Research Laboratory since 1995. </p>
<p>10. ReWrite Decoder <br /><a href="http://www.isi.edu/licensed-sw/rewrite-decoder/" target="_blank" rel="nofollow">http://www.isi.edu/licensed-sw/rewrite-decoder/</a> <br />The ISI ReWrite Decoder Release 1.0.0a by Daniel Marcu and Ulrich <br />Germann. It is a program that translates from one natural languge into <br />another using statistical machine translation. </p>
<p>11. GATE (General Architecture for Text Engineering) <br /><a href="http://gate.ac.uk/" target="_blank" rel="nofollow">http://gate.ac.uk/</a> <br />A Java Library for Text Engineering </p>
<p>III. Machine Learning <br />1. YASMET: Yet Another Small MaxEnt Toolkit (Statistical Machine <br />Learning) <br /><a href="http://www.fjoch.com/YASMET.html" target="_blank" rel="nofollow">http://www.fjoch.com/YASMET.html</a> <br />由Franz Josef Och编写。此外，OpenNLP项目里有一个java的MaxEnt工具，使用GIS估计参数，由东北大学的张乐(目前在 <br />英国留学)port为C++版本 </p>
<p>2. LibSVM <br />由国立台湾大学(ntu)的Chih-Jen Lin开发，有C++，Java，perl，C#等多个语言版本 <br /><a href="http://www.csie.ntu.edu.tw/~cjlin/libsvm/" target="_blank" rel="nofollow">http://www.csie.ntu.edu.tw/~cjlin/libsvm/</a> <br />LIBSVM is an integrated software for support vector classification, (C- <br />SVC, nu-SVC ), regression (epsilon-SVR, nu-SVR) and distribution <br />estimation (one-class SVM ). It supports multi-class classification. </p>
<p>3. SVM Light <br />由cornell的Thorsten Joachims在dortmund大学时开发，成为LibSVM之后最为有名的SVM软件包。开源，用C语言编 <br />写，用于ranking问题 <br /><a href="http://svmlight.joachims.org/" target="_blank" rel="nofollow">http://svmlight.joachims.org/</a> </p>
<p>4. CLUTO <br /><a href="http://www-users.cs.umn.edu/~karypis/cluto/" target="_blank" rel="nofollow">http://www-users.cs.umn.edu/~karypis/cluto/</a> <br />a software package for clustering low- and high-dimensional datasets <br />这个软件包只提供executable/library两种形式，不提供源代码下载 </p>
<p>5. CRF++ <br /><a href="http://chasen.org/~taku/software/CRF++/" target="_blank" rel="nofollow">http://chasen.org/~taku/software/CRF++/</a> <br />Yet Another CRF toolkit fo<br />
r segmenting/labelling sequential data <br />CRF(Conditional Random Fields)，由HMM/MEMM发展起来，广泛用于IE、IR、NLP领域 </p>
<p>6. SVM Struct <br /><a href="http://www.cs.cornell.edu/People/tj/svm_light/svm_struct.html" target="_blank" rel="nofollow">http://www.cs.cornell.edu/People/tj/svm_light/svm_struct.html</a> <br />同SVM Light，均由cornell的Thorsten Joachims开发。 <br />SVMstruct is a Support Vector Machine (SVM) algorithm for predicting <br />multivariate outputs. It performs supervised learning by approximating <br />a mapping <br />h: X &#8211;&gt; Y <br />using labeled training examples (x1,y1), &#8230;, (xn,yn). <br />Unlike regular SVMs, however, which consider only univariate <br />predictions like in classification and regression, SVMstruct can <br />predict complex objects y like trees, sequences, or sets. Examples of <br />problems with complex outputs are natural language parsing, sequence <br />alignment in protein homology detection, and markov models for part-of- <br />speech tagging. <br />SVMstruct can be thought of as an API for implementing different kinds <br />of complex prediction algorithms. Currently, we have implemented the <br />following learning tasks: <br />SVMmulticlass: Multi-class classification. Learns to predict one of k <br />mutually exclusive classes. This is probably the simplest possible <br />instance of SVMstruct and serves as a tutorial example of how to use <br />the programming interface. <br />SVMcfg: Learns a weighted context free grammar from examples. Training <br />examples (e.g. for natural language parsing) specify the sentence <br />along with the correct parse tree. The goal is to predict the parse <br />tree of new sentences. <br />SVMalign: Learning to align sequences. Given examples of how sequence <br />pairs align, the goal is to learn the substitution matrix as well as <br />the insertion and deletion costs of operations so that one can predict <br />alignments of new sequences. <br />SVMhmm: Learns a Markov model from examples. Training examples (e.g. <br />for part-of-speech tagging) specify the sequence of words along with <br />the correct assignment of tags (i.e. states). The goal is to predict <br />the tag sequences for new sentences. </p>
<p>IV. Misc: <br />1. Notepad++: 一个开源编辑器，支持C#，perl，CSS等几十种语言的关键字，功能可与新版的UltraEdit，Visual <br />Studio .NET媲美 <br /><a href="http://notepad-plus.sourceforge.net/" target="_blank" rel="nofollow">http://notepad-plus.sourceforge.net/</a> </p>
<p>2. WinMerge: 用于文本内容比较，找出不同版本的两个程序的差异 <br />winmerge.sourceforge.net/ </p>
<p>3. OpenPerlIDE: 开源的perl编辑器，内置编译、逐行调试功能 <br />open-perl-ide.sourceforge.net/ <br />ps: 论起编辑器偶见过的最好的还是VS .NET了，在每个function前面有+/-号支持expand/collapse，支持区域 <br />copy/cut/paste，使用ctrl+ c/ctrl+x/ctrl+v可以一次选取一行，使用ctrl+k+c/ctrl+k+u可以 <br />comment/uncomment多行，还有还有&#8230;&#8230; Visual Studio .NET is really kool:D </p>
<p>4. Berkeley DB <br /><a href="http://www.sleepycat.com/" target="_blank" rel="nofollow">http://www.sleepycat.com/</a> <br />Berkeley DB不是一个关系数据库，它被称做是一个嵌入式数据库：对于c/s模型来说，它的client和server共用一个地址空间。由于 <br />数据库最初是从文件系统中发展起来的，它更像是一个key-value pair的字典型数据库。而且数据库文件能够序列化到硬盘中，所以不受内存大小 <br />限制。BDB有个子版本Berkeley DB XML，它是一个xml数据库：以xml文件形式存储数据？BDB已被包括 <br />microsoft、google、HP、ford、motorola等公司嵌入到自己的产品中去了 <br />Berkeley DB (libdb) is a programmatic toolkit that provides embedded <br />database support for both traditional and client/server applications. <br />It includes b+tree, queue, extended linear hashing, fixed, and <br />variable-length record access methods, transactions, locking, logging, <br />shared memory caching, database recovery, and replication for highly <br />available systems. DB supports C, C++, Java, PHP, and Perl APIs. <br />It turns out that at a basic level Berkeley DB is just a very high <br />performance, reliable way of persisting dictionary style data <br />structures &#8211; anything where a piece of data can be stored and looked <br />up using a unique key. The key and the value can each be up to 4 <br />gigabytes in length and can consist of anything that can be crammed in <br />to a string of bytes, so what you do with it is completely up to you. <br />The only operations available are &#8220;store this value under this key&#8221;, <br />&#8220;check if this key exists&#8221; and &#8220;retrieve the value for this key&#8221; so <br />conceptually it&#8217;s pretty simple &#8211; the complicated stuff all happens <br />under the hood. <br />case study: <br />Ask Jeeves uses Berkeley DB to provide an easy-to-use tool for <br />searching the Internet. <br />Microsoft uses Berkeley DB for the Groove collaboration software <br />AOL uses Berkeley DB for search tool meta-data and other services. <br />Hitachi uses Berkeley DB in its directory services server product. <br />Ford uses Berkeley DB to authenticate partners who access Ford&#8217;s Web <br />applications. <br />Hewlett Packard uses Berkeley DB in serveral products, including <br />storage, security and wireless software. <br />Google uses Berkeley DB High Availability for Google Accounts. <br />Motorola uses Berkeley DB to track mobile units in its wireless radio <br />network products. </p>
<p>5. LaTeX <br />LATEX, written as LaTeX in plain text, is a document preparation <br />system for the TeX typesetting program. <br />It offers programmable desktop publishing features and extensive <br />facilities for automating most aspects of typesetting and desktop <br />publishing, including numbering and cross-referencing, tables and <br />figures, page layout, bibliographies, and much more. LaTeX was <br />originally written in 1984 by Leslie Lamport and has become the <br />dominant method for using TeX-few people write in plain TeX anymore. <br />The current version is LaTeX2ε. <br />中文套装可以在<a href="http://www.ctex.org%e6%89%be/" target="_blank" rel="nofollow">http://www.ctex.org%e6%89%be/</a>到 <br /><a href="http://learn.tsinghua.edu.cn:8080/2001315450/comp.html" target="_blank" rel="nofollow">http://learn.tsinghua.edu.cn:8080/2001315450/comp.html</a> by王垠 </p>
<p>6. EditPlus <br /><a href="http://www.editplus.com/" target="_blank" rel="nofollow">http://www.editplus.com/</a> <br />EditPlus is an Internet-ready 32-bit text editor, HTML editor and <br />programmers editor for Windows. While it can serve as a good <br />replacement for Notepad, it also offers many powerful features for Web <br />page authors and programmers. <br />EditPlus当前最新版本是2.21，BrE和AmE的spell checker需要单独下载安装包安装 </p>
<p>7. GVim: Vi IMproved <br /><a href="http://www.vim.org/index.php" target="_blank" rel="nofollow">http://www.vim.org/index.php</a> <br />Vim is an advanced text editor that seeks to provide the power of the <br />de-facto Unix editor &#8216;Vi&#8217;, with a more complete feature set. It&#8217;s <br />useful whether you&#8217;re already using vi or using a different editor. <br />Users of Vim 5 should consider upgrading to Vim 6, which is greatly <br />enhanced since Vim 5. Vim is often called a &#8220;programmer&#8217;s editor,&#8221; and <br />so useful for programming that many consider it an entire IDE. It&#8217;s <br />not just for programmers, though. Vim is perfect for all kinds of text <br />editing, from composing email to editing configuration files. <br />普通windows用户可以从这个链接下载<a href="ftp://ftp.vim.org/pub/vim/pc/gvim64.exe" target="_blank" rel="nofollow">ftp://ftp.vim.org/pub/vim/pc/gvim64.exe</a> </p>
<p>8. Cygwin: GNU + Cygnus + Windows <br /><a href="http://www.cygwin.com/" target="_blank" rel="nofollow">http://www.cygwin.com/</a> <br />Cygwin is a Linux-like<br />
 environment for Windows. It consists of two <br />parts: A DLL (cygwin1.dll) which acts as a Linux API emulation layer <br />providing substantial Linux API functionality. A collection of tools, <br />which provide Linux look and feel. </p>
<p>9. MinGW: Minimalistic GNU for Windows <br /><a href="http://www.mingw.org/" target="_blank" rel="nofollow">http://www.mingw.org/</a> <br />MinGW: A collection of freely available and freely distributable <br />Windows specific header files and import libraries combined with GNU <br />toolsets that allow one to produce native Windows programs that do not <br />rely on any 3rd-party C runtime DLLs. <br />在windows下编译、移植unix/linux平台的软件。cygwin相当于在windows系统层上模拟了一个POSIX-compliant <br />的layer(库文件是cygwin1.dll)；而mingw则是使用 windows自身的库文件(msvcrt.dll)实现了一些符合 <br />POSIX spec的功能，并不是完全POSIX-compliant。mingw其实是cygwin的一个branch，由于它没有实现 <br />linux api的模拟层，所以开销要比cygwin低些。 </p>
<p>10. CutePDF Writer <br /><a href="http://www.cutepdf.com/" target="_blank" rel="nofollow">http://www.cutepdf.com/</a> <br />Portable Document format (PDF) is the de facto standard for the secure <br />and reliable distribution and exchange of electronic documents and <br />forms around the world. CutePDF Writer (formerly CutePDF Printer) is <br />the free version of commercial PDF creation software. CutePDF Writer <br />installs itself as a &#8220;printer subsystem&#8221;. This enables virtually any <br />Windows applications (must be able to print) to create professional <br />quality PDF documents &#8211; with just a push of a button! <br />比起acrobat来，一大优点就是它是免费的。而且一般word图表、公式的转换效果很好，what you see is what you <br />get，哈哈。可能需要ps2pdf converter，在该站点有链接提供下载 </p>
<p>11. R <br /><a href="http://www.r-project.org/" target="_blank" rel="nofollow">http://www.r-project.org/</a> <br />R is a language and environment for statistical computing and <br />graphics. It is a GNU project which is similar to the S language and <br />environment which was developed at Bell Laboratories (formerly AT&amp;T, <br />now Lucent Technologies) by John Chambers and colleagues. R can be <br />considered as a different implementation of S. There are some <br />important differences, but much code written for S runs unaltered <br />under R. <br />R provides a wide variety of statistical (linear and nonlinear <br />modelling, classical statistical tests, time-series analysis, <br />classification, clustering, &#8230;) and graphical techniques, and is <br />highly extensible. The S language is often the vehicle of choice for <br />research in statistical methodology, and R provides an Open Source <br />route to participation in that activity. <br />One of R&#8217;s strengths is the ease with which well-designed publication- <br />quality plots can be produced, including mathematical symbols and <br />formulae where needed. Great care has been taken over the defaults for <br />the minor design choices in graphics, but the user retains full <br />control. <br />R is available as Free Software under the terms of the Free Software <br />Foundation&#8217;s GNU General Public License in source code form. It <br />compiles and runs on a wide variety of UNIX platforms and similar <br />systems (including FreeBSD and Linux), Windows and MacOS. <br />R统计软件与MatLab类似，都是用在科学计算领域的。不同的是它是开源的东东:)</p>
<p xmlns="" class="zoundry_raven_tags">  <!-- Tag links generated by Zoundry Raven. Do not manually edit. http://www.zoundryraven.com -->  <span class="ztags"><span class="ztagspace">Technorati</span> : <a href="http://technorati.com/tag/Toolkits%20%20IR%20NLP%20ML" class="ztag" rel="tag">Toolkits  IR NLP ML</a></span> </p>
]]></content:encoded>
			<wfw:commentRss>http://blog.zye.me/2008/03/14666.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

