<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Information Retrieval Blog &#187; information Retrieval</title>
	<atom:link href="http://blog.zye.me/tag/information-retrieval/feed" rel="self" type="application/rss+xml" />
	<link>http://blog.zye.me</link>
	<description>REAL TIME DATA PROCESSING, DISTRIBUTED COMPUTING, PATTERN DISCOVERY</description>
	<lastBuildDate>Tue, 31 Jan 2012 02:05:19 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>SIGIR PAPER LIST 2005,2006,2007</title>
		<link>http://blog.zye.me/2011/08/22765.html</link>
		<comments>http://blog.zye.me/2011/08/22765.html#comments</comments>
		<pubDate>Sat, 06 Aug 2011 02:29:28 +0000</pubDate>
		<dc:creator>yezheng</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[information Retrieval]]></category>
		<category><![CDATA[papers]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[SIGIR]]></category>
		<category><![CDATA[信息检索]]></category>

		<guid isPermaLink="false">http://www.5yiso.cn/2008/03/22765.html</guid>
		<description><![CDATA[The 30th Annual International ACM SIGIR Conference 23-27 July 2007, Amsterdam Accepted Papers Hierarchical Classification for Automatic Image Annotation Jianping Fan Alternatives to Bpref Tetsuya Sakai Laplacian Optimal Design for Image Retrieval Xiaofei He, Deng Cai Federated Text Retrieval From Uncooperative Overlapped Collections Milad Shokouhi, Justin Zobel A New Approach for Evaluating Query Expansion: Query-document <a href='http://blog.zye.me/2011/08/22765.html'>[...]</a>]]></description>
			<content:encoded><![CDATA[<p class="sigirWebTitle">The 30<sup>th</sup> Annual International ACM SIGIR Conference<br />
23-27 July 2007, Amsterdam</p>
<h2>Accepted Papers</h2>
<p>Hierarchical Classification for Automatic Image Annotation<br />
<em>Jianping Fan </em></p>
<p>Alternatives to Bpref<br />
<em>Tetsuya Sakai </em></p>
<p>Laplacian Optimal Design for Image Retrieval<br />
<em>Xiaofei He, Deng Cai </em></p>
<p>Federated Text Retrieval From Uncooperative Overlapped Collections<br />
<em>Milad Shokouhi, Justin Zobel </em></p>
<p>A New Approach for Evaluating Query Expansion: Query-document  Term Mismatch<br />
<em>Tonya Custis, Khalid Al-Kofahi </em></p>
<p>Fast Generation of Result Snippets in Web Search<br />
<em>Andrew Turpin, Yohannes Tsegay, David Hawking, Hugh E. Williams </em></p>
<p>Updating Collection Representations For Federated Search<br />
<em>Milad Shokouhi, Mark Baillie, Leif Azzopardi </em></p>
<p>HITS hits TREC: Exploring IR evaluation results with network analysis<br />
<em>Stefano Mizzaro, Stephen Robertson </em></p>
<p>Latent Concept Expansion Using Markov Random Fields<br />
<em>Donald Metzler, Bruce Croft </em></p>
<p>Query Performance Prediction in Web Search Environments<br />
<em>Yun Zhou, Bruce Croft </em></p>
<p>Indexing Confusion Networks for Morph-based Spoken Document  Retrieval<br />
<em>Ville Turunen, Mikko Kurimo </em></p>
<p>Reliable Information Retrieval Evaluation with Incomplete and Biased  Judgements<br />
<em>Stefan Buettcher, Charles Clarke, Peter Yeung, Ian Soboroff </em></p>
<p>New Event Detection Based on Indexing-tree and Named Entity<br />
<em>Kuo ZHANG, JuanZi LI, Gang WU </em></p>
<p>A Time Machine for Text Search<br />
<em>Klaus Berberich, Srikanta Bedathur, Thomas Neumann, Gerhard Weikum </em></p>
<p>Compressed Permuterm Index<br />
<em>Paolo Ferragina, Rossano Venturini </em></p>
<p>Detecting, Categorizing and Clustering Entity Mentions in Chinese  Text<br />
<em>Wenjie Li, Donglei Qian, Chunfa Yuan, Qin Lu </em></p>
<p>FRank: A Ranking Method with Fidelity Loss<br />
<em>Ming-Feng Tsai, Tie-Yan Liu, Tao Qin, Hsin-Hsi Chen, Wei-Ying Ma </em></p>
<p>A Regression Framework for Learning Ranking Functions Using  Relative Relevance Judgments<br />
<em>Zhaohui Zheng, Hongyuan Zha, Keke Chen, Gordon Sun </em></p>
<p>History Repeats Itself: Re-Finding Queries in a Major Search  Engine&#8217;s Logs<br />
<em>Jaime Teevan, Eytan Adar, Rosie Jones, Michael Potts </em></p>
<p>Random Walks on the Click Graph<br />
<em>Nick Craswell, Martin Szummer </em></p>
<p>Towards Automatic Extraction of Event and Place Semantics from  Flickr Tags<br />
<em>Tye Rattenbury, Nathaniel Good, Mor Naaman </em></p>
<p>Clustering of Documents with Local and Global Regularization<br />
<em>Fei Wang, Changshui Zhang, Tao Li </em></p>
<p>An InterActive Algorithm For Asking And Incorporating Feature  Feedback into Support Vector Machines<br />
<em>Hema Raghavan, James Allan </em></p>
<p>Efficient Document Retrieval in Main Memory<br />
<em>Trevor Strohman, Bruce Croft </em></p>
<p>A Boosting Algorithm for Information Retrieval<br />
<em>Jun Xu, Hang Li </em></p>
<p>How well does result relevance predict session satisfaction?<br />
<em>Scott Huffman, Michael Hochster </em></p>
<p>A Support Vector Method for Optimizing Average Precision<br />
<em>Yisong Yue, Thomas Finley, Filip Radlinski, Thorsten Joachims </em></p>
<p>Strategic System Comparisons via Targeted Relevance Judgments<br />
<em>Alistair Moffat, William Webber, Justin Zobel </em></p>
<p>Topic Segmentation with Shared Topic Detection and Alignment of  Multiple Documents<br />
<em>Sun Bingjun, Prasenjit Mitra, Lee Giles, Hongyuan Zha, John Yen </em></p>
<p>HITS on the Web: How does it Compare?<br />
<em>Marc Najork, Hugo Zaragoza, Michael Taylor </em></p>
<p>Effective Missing Data Prediction for Collaborative Filtering<br />
<em>Hao Ma, Irwin King, Michael R. Lyu </em></p>
<p>Feature Selection for Ranking<br />
<em>Xiubo Geng, Tie-Yan Liu, Tao Qin </em></p>
<p>Interesting Nuggets and Their Impact on Definitional Question  Answering<br />
<em>Kian-Wei Kor, Tat Seng Chua </em></p>
<p>Ranking with Multiple Hyperplanes<br />
<em>Tao Qin, Tie-Yan Liu, Wei Lai, Xu-Dong Zhang, De-Sheng Wang, Hang Li </em></p>
<p>Building Simulated Queries for Known-Item Topics: An Analysis  using Six European Languages<br />
<em>Leif Azzopardi, Maarten de Rijke, Krisztian Balog </em></p>
<p>CollabSum: Exploiting Multiple Document Clustering for Collaborative  Single Document Summarizations<br />
<em>Xiaojun Wan </em></p>
<p>The Influence of Caption Features on Clickthrough Patterns in Web  Search<br />
<em>Charles Clarke, Eugene Agichtein, Susan Dumais, Ryen White </em></p>
<p>Personalized Query Expansion for the Web<br />
<em>Paul &#8211; Alexandru Chirita, Claudiu Firan, Wolfgang Nejdl </em></p>
<p>Principles of Hash-based Text Retrieval<br />
<em>Benno Stein </em></p>
<p>An Outranking Approach for Rank Aggregation in Information  Retrieval<br />
<em>Mohamed Farah, Daniel Vanderpooten </em></p>
<p>Deconstructing Nuggets: The Stability and Reliability of Complex  Question Answering Evaluation<br />
<em>Jimmy Lin, Pengyi Zhang </em></p>
<p>DiffusionRank: A Possible Penicillin for Web Spamming<br />
<em>Haixuan Yang, Irwin King, Michael R. Lyu </em></p>
<p>Investigating the Querying and Browsing Behavior of Advanced  Search Engine Users<br />
<em>Ryen White, Dan Morris </em></p>
<p>Neighborhood Restrictions in Geographic IR<br />
<em>Steven Schockaert, Martine De Cock </em></p>
<p>A Probabilistic Graphical Model for Joint Answer Ranking in Question  Answering<br />
<em>Jeongwoo Ko, Luo Si, Eric Nyberg </em></p>
<p>Towards Task-based PIM Evaluations<br />
<em>David Elsweiler, Ian Ruthven </em></p>
<p>Utility-based Information Distillation Over Temporally Sequenced  Documents<br />
<em>Yiming Yang, Abhimanyu Lad, Ni Lao, Abhay Harpale, Bryan Kisiel, Monica Rogati, Jian Zhang, Jaime Carbonell, Peter Brusilovsky, Daqing He </em></p>
<p>A Semantic Approach to Contextual Advertising<br />
<em>Vanja Josifovski, Andrei Broder, Lance Riedel, Marcus Fontoura </em></p>
<p>Test Theory for Assessing IR Test Collections<br />
<em>David Bodoff, Pu Li </em></p>
<p>Vocabulary Independent Spoken Term Detection<br />
<em>Jonathan Mamou, Bhuvana Ramabhadran, Olivier Siohan </em></p>
<p>ESTER: Efficient Search on Text, Entities, and Relations<br />
<em>Holger Bast, Alexandru Chitea, Fabian Suchanek, Ingmar Weber </em></p>
<p>A Combined Component Approach for Finding Collection-Adapted  Ranking Functions based on Genetic Prog<br />
<em>Humberto Almeida, Marcos Goncalves, Marco Cristo, Pavel Calado </em></p>
<p>Supporting Multiple Information Seeking Strategies in a Single  System Framework<br />
<em>Xiaojun Yuan, Nicholas Belkin </em></p>
<p>Context Sensitive Stemming for Web Search<br />
<em>Fuchun Peng, Nawaaz Ahmed, Xin Li, Yumao Lu </em></p>
<p>Know your Neighbors: Web Spam Detection using the Web Topology<br />
<em>Carlos Castillo, Debora Donato, Aristides Gionis, Vanessa Murdock,  Fabrizio Silvestri </em></p>
<p>Combining Content and Link for Classification using Matrix  Factorization<br />
<em>Shenghuo Zhu, Kai Yu, Yun Chi, Yihong Gong </em></p>
<p>Evaluating sampling methods for uncooperative collections<br />
<em>Paul Thomas, David Hawking </em></p>
<p>An Exploration of Proximity Measures in Information Retrieval<br />
<em>Tao Tao, ChengXiang Zhai </em></p>
<p>Relaxed Online Support Vector Machines for Spam Filtering<br />
<em>D. Sculley, Gabriel Wachman (best student paper)</em></p>
<p>Robust Classification of Rare Queries Using Web Knowledge<br />
<em>Andrei Broder, Marcus Fontoura, Evgeniy Gabrilovich, Amruta Joshi, Vanja  Josifovski, Tong Zhang </em></p>
<p>Multiple-signal duplicate detection for search evaluation<br />
<em>Scott Huffman, April Lehman, Alexei Stolboushkin, Howard Wong-Toi, Fan  Yang, Hein Roehrig </em></p>
<p>Structured Retrieval for Question Answering<br />
<em>Matthew Bilotti, Paul Ogilvie, Jamie Callan, Eric Nyberg </em></p>
<p>Robust Evaluation of Information Retrieval Systems<br />
<em>Ben Carterette </em></p>
<p>On the Robustness of Relevance Measures with Incomplete  Judgments<br />
<em>Tanuja Bompada, Chi-Chao Chang, John Chen, Ravi Kumar, Rajesh  Shenoy </em></p>
<p>Cross-Lingual Query Suggestion Using Query Logs of Different  Languages<br />
<em>Wei Gao, Cheng Niu, Jian-Yun Nie, Ming Zhou, Jian Hu, Kam-Fai Wong,  Hsiao-Wuen Hon </em></p>
<p>Efficient Bayesian Hierarchical User Modeling for Recommendation  Systems<br />
<em>Yi Zhang, Jonathan Koren </em></p>
<p>Studying the Use of Popular Destinations to Enhance Web Search  Interaction<br />
<em>Ryen White, Mikhail Bilenko, Silviu Cucerzan (best paper)</em></p>
<p>Knowledge-intensive Conceptual Retrieval and Passage Extraction of  Biomedical Literature<br />
<em>Wei Zhou, Clement Yu, Neil Smalheiser, Vetle Torvik, Jie Hong </em></p>
<p>The Impact of Caching on Search Engines<br />
<em>Ricardo Baeza-Yates, Aristides Gionis, Flavio Junqueira, Vanessa Murdock,  Vassilis Plachouras, Fabrizio Silvestri </em></p>
<p>Heavy-Tailed Distributions and Multi-Keyword Queries<br />
<em>Arnd Konig, Surajit Chaudhuri, Liying Sui, Kenneth Church </em></p>
<p>Improving Text Classification for Oral History Archives with Temporal  Domain Knowledge<br />
<em>James Olsson, Douglas Oard </em></p>
<p>Estimation and Use of Uncertainty in Pseudo-relevance Feedback<br />
<em>Kevyn Collins-Thompson, Jamie Callan </em></p>
<p>Term Feedback for Information Retrieval with Language Models<br />
<em>Bin Tan, Atulya Velivelli, Hui Fang, ChengXiang Zhai </em></p>
<p>Enhancing Relevance Scoring With Chronological Term Rank<br />
<em>Adam Troy, Guo-Qiang Zhang </em></p>
<p>Inverted Index Pruning with Correctness Guarantee<br />
<em>Alexandros Ntoulas, Junghoo-John Cho </em></p>
<p>A Study of Poisson Query Generation Model for Information Retrieval<br />
<em>Qiaozhu Mei, Hui Fang, ChengXiang Zhai </em></p>
<p>ARSA: A Sentiment-Aware Model for Predicting Sales Performance  Using Blogs<br />
<em>Yang Liu, Jimmy Huang, Aijun An, Xiaohui Yu </em></p>
<p>A Music Search Engine Built upon Audio-based and Web-based  Similarity Measures<br />
<em>Peter Knees, Tim Pohle, Markus Schedl, Gerhard Widmer </em></p>
<p>Learn from Web Search Logs to Organize Search Results<br />
<em>Xuanhui Wang, ChengXiang Zhai </em></p>
<p>Using Query Contexts in Information Retrieval<br />
<em>Jing Bai, Jian-Yun Nie, Hugue Bouchard, Guihong Cao </em></p>
<p>Measuring the Spatial Correlation of Retrieval Functions for Zero- Judgment Performance Prediction<br />
<em>Fernando Diaz </em></p>
<p>Towards Musical Query-by-Semantic-Description using the CAL500  Data Set<br />
<em>Douglas Turnbull, Luke Barrington, David Torres, Gert Lanckriet </em></p>
<p>Web Text Retrieval with a P2P Query-Driven Index<br />
<em>Gleb Skobeltsyn, Toan Luu, Ivana Podnar, Martin Rajman, Karl Aberer </em></p>
<p>Analyzing Feature Trajectories for Event Detection<br />
<em>Qi He, Kuiyu Chang, Ee-Peng Lim </em></p>
<p>Broad Expertise Retrieval in Sparse Data Environments<br />
<em>Krisztian Balog, Maarten de Rijke, Leif Azzopardi </em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.zye.me/2011/08/22765.html/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Papers Written by Googlers</title>
		<link>http://blog.zye.me/2011/06/27952.html</link>
		<comments>http://blog.zye.me/2011/06/27952.html#comments</comments>
		<pubDate>Sat, 25 Jun 2011 14:27:47 +0000</pubDate>
		<dc:creator>yezheng</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[information Retrieval]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[papers]]></category>

		<guid isPermaLink="false">http://www.5yiso.cn/2008/04/27952.html</guid>
		<description><![CDATA[Google公布了很多他们研究的papers，感觉很多非常不错。下面是链接 Below is a partial list of publications by people after joining Google, organized by category. There is also a list  organized by year , and an atom feed is also available. Categories Algorithms and Theory (151) Artificial Intelligence and Data Mining (72) Audio, Video, and Image Processing (79) Distributed Systems and Parallel Computing (117) <a href='http://blog.zye.me/2011/06/27952.html'>[...]</a>]]></description>
			<content:encoded><![CDATA[<p>Google公布了很多他们研究的papers，感觉很多非常不错。下面是链接</p>
<p>Below is a partial list of publications by people after joining Google,<br />
 organized by category. There is also a list  <a href="http://research.google.com/pubs/papers.html">organized by year</a> , and an <a href="atom.xml">atom<br />
 feed <img src="http://video.google.com/images/feed.gif" border="0" alt="Atom Feed" width="16" height="15" /></a> is also<br />
 available.</p>
<p><br class="spacer_" /></p>
<h3>Categories</h3>
<div>
<ul>
<li>Algorithms and Theory (151)</li>
<li>Artificial Intelligence and Data Mining (72)</li>
<li>Audio, Video, and Image Processing (79)</li>
<li>Distributed Systems and Parallel Computing (117)</li>
<li>Education (5)</li>
<li>General Science (22)</li>
<li>Human-Computer Interaction and Visualization (69)</li>
</ul>
<ul>
<li>Hypertext and the Web (26)</li>
<li>Information Retrieval (59)</li>
<li>Machine Learning (122)</li>
<li>Natural Language Processing (110)</li>
<li>Security, Cryptography, and Privacy (85)</li>
<li>Software Engineering (33)</li>
<li>Systems (10)</li>
</ul>
</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.zye.me/2011/06/27952.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Indri中的动态文档索引技术（转）</title>
		<link>http://blog.zye.me/2011/06/5150.html</link>
		<comments>http://blog.zye.me/2011/06/5150.html#comments</comments>
		<pubDate>Sat, 04 Jun 2011 02:29:04 +0000</pubDate>
		<dc:creator>yezheng</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[indexing]]></category>
		<category><![CDATA[Indri]]></category>
		<category><![CDATA[information Retrieval]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[信息检索]]></category>
		<category><![CDATA[索引]]></category>

		<guid isPermaLink="false">http://www.5yiso.cn/2007/02/5150.html</guid>
		<description><![CDATA[Indri中的动态文档索引技术 戴维译 摘要： Indri 动态文档索引的实现技术，支持在更新索引的同时处理用户在线查询请求。 文本搜索引擎曾被设计为针对固定的文档集合进行查询，对不少应用来说，这种机制工作得很好，然而对于诸于新闻，财经和桌面搜索而言，需要的是高效、经常性的更新索引。 以往支持动态文档集合的研究主要围绕增量索引方法，增量系统通过往已有的索引中追加大的文档集合来优化索引性能，但是不允许在增量索引的同时处理用户查询。 与以往的增量系统不同，Indri搜索引擎的最新版本支持动态文档集合，不需要通过加大文档集合大小来获取索引性能，同时Indri支持索引和查询的并发，允许用户在增量索引的同时进行查询。 1．介绍 尽管全文索引技术已经出现了几十年之久，但是直到互联网的出现，它才真正得到普及。现在，几乎每个互联网使用者都是搜索引擎用户，全文搜索技术被广泛地用于各种应用领域，如Web搜索，新闻搜索，以及时下流行的桌面搜索等。 搜索桌面（或硬盘）文件和e-mails对大多数信息检索系统而言是一个新的挑战。用户期望他们的e-mails即到即索引，文件在保存到磁盘的顷刻便被索引好。永远不要期望桌面搜索用户能忍受由于索引更新所带来的存储消耗。不管是祈祷还是咒骂，用户看到了全文搜索的好处，更为普遍的事实是人们越来越发现自己离不开搜索。 然后，更新一个全文索引是一个耗时的过程，现在的搜索引擎通过建立大量文档的倒排索引表来创建索引，一个倒排索引项包含一篇文章的无重复词语列表，以及这些词语的附加信息（如词语在文档中的位置，词性等）。一篇短小的包含100个左右不重复词语的文章，索引的时候需要更新100个倒排索引项。如果这些倒排索引是存储在磁盘上的话，更新操作将需要100次的磁盘寻址，在现有的硬件配置下，这需要一秒或更长的时间。当然，长文章耗时也越长。另外，一个随之而来的问题是，在更新索引的时候，搜索引擎是该让用户等待更新完成还是继续处理用户的查询请求呢？ Indri搜索引擎新的版本突破了上述限制。作为Lemur[1]项目研发的一部分，Indri支持结构化查询语言，采用语言建模方法[2]，同时为了满足问答系统的需要，Indri还支持对结构化文档不同域进行查询。Indri第一个版本没有加入对增量索引的支持，但在最新的版本中允许对单个文档进行真正的实时索引。 一个信息检索系统要处理动态的文档集合，需要解决三个关键问题。首先，要能快速地添加和删除集合中的文档，这里的快速取决于一个桌面搜索用户愿意花多长的时间来等待索引的完成，也就是说对于单个文档而言响应应该是瞬时的。其次，系统要允许查询在任何时候都能得到响应，即使是在新文档添加进文档集合的同时。另外，系统要实用，在索引和检索性能上比不支持动态文档集合的检索系统更具有竞争性。 在Indri最新版本中，我们通过引入如下设计原则来实现上述目标： 内存结构-为了避免读写磁盘，尽可能长地把数据调入内存。 加锁机制-系统在尽可能小的时间段内对数据进行加锁互斥。 只读结构-为了减少对互斥锁的依赖，系统引入只读数据结构。 后台I/O-系统采用后台线程来和低速设备进行交互，以提高索引操作性能。 多版本结构-如果一个耗时长的操作需要获取可能已经更新的数据，系统将维持此数据的多个版本来减少互斥锁的使用。 上述原则的具体运用将在文章后续部分进行详细介绍。 2．相关工作 数据库研究团体已经花了几十年时间来研究数据获取并发技术。Ramakrishnan和Gehrke提出了一个通用的数据库原则[3]，Gray和Reuter则进一步深入探讨了事务处理系统[4]。 尽管访问文档数据和访问数据库数据所遇到的问题类似，但它们之间仍然存在着显著的差别。在数据库系统中，用户特别关心的是数据的原子性，有一个经典得例子，一位银行顾客把d美元钱从一个储蓄户头转到一个支票账号，如果d美元先从储蓄户头减掉，那么这位顾客的总钱数就少了d美元；如果先往支票账号加入d美元，那么他的钱就凭空多了d美元。不管怎么样，在第一个账号改变的同时就会出现金额不一致的情况。于是，数据库中事件并发研究的一个主要任务就是确保数据库的一致性，即使是在系统操作失败的情况下也要如此。 在我们系统中可以确保文档的插入和删除是原子操作，没有用户会看到一个文档是部分被删除或者插入的。但我们不允许多个文档的插入和删除是原子的。既然文档之间很少像数据库中记录那样相互依赖，这就不会成为我们系统的主要限制。 虽然有上述的差异，我们仍然可以从数据库中得到借鉴。异步I/O和互斥技术被现代数据库系统广泛使用，我们也在索引系统中采用类似的多版本并发技术[5]。 信息检索研究团体没有忽略动态文档集合，他们把研究重点放在了增量系统上，这种系统通过一次性添加大批量文档到已有索引上来代替单个文档的高效添加。这种研究并没有考虑当更新索引时系统能否继续处理查询的问题。 Brown，Callan和Croft[6]研究了一种高效增量索引的方法。他们区别对待小于8k和大于8k的索引，当一个小的索引需要增大时，它将被拷贝到一个大的连续的倒排索引文件中。然而对于一个大的索引项则不需要移动，只需添加一个前向指针到新的存储段(segment)里面。这也使得倒排索引可以通过倒排索引文件串连起来。他们发现，当在7个簇中创建一个索引的时候，查询性能降低了6%。使用小的簇时代价偏高，在他们的模型中索引每簇大小为64M的文本所花的时间是索引每簇大小为1M的文本的8倍。 在最近的研究中，Lester，Zobel以及Williams[7]比较了三种索引策略：占位(in-place)，合并（re-merge）以及重构（rebuild）。除了没有对连接链表进行优化外，in-place策略类似Brown所采用的方法。所有的倒排索引连续存放，如果没有足够的空间写入新数据的时候，已有数据必须被拷贝到别的地方。在re-merge策略中，新的文档簇被创建到单独的索引中，然后和已经存在的索引进行合并。rebuild策略则对已经构建的索引弃之不顾，在原始文档的基础上重新构建索引。他们研究发现，re-merge策略是最高效的更新索引的方法。但是，他们没有像Brow，Callan和Croft所作的那样，对预分配(pre-allocation)策略之间和处理大索引策略之间的差异进行比较。 Lester，Zobel和Williams提到，在使用最小的文档簇(10个文档)的情况下，表现最好的索引策略(in-place策略)在大约7秒的时间内更新了1G的索引。相对于别的策略而言，这已经是非常快了，但是，对于单个文档的索引更新来说，这并不是个理想的策略。 本文描述的方法类似Lucene搜索引擎，正常情况下，就像传统的批量索引一样，Lucene以分段(segment)的方式把数据写入磁盘。一旦数据被写入段中，他们就可以被查询到，并且不需要进行段的合并。这和Brown，Callan和Croft的连接链表方法有点类似，只是，把数据写入簇(batch)中需要更多的开销。为了获得更好的性能，许多文档必须被写入磁盘的一个簇(batch)中。 如果需要快速的响应，Lucene提供一个RAMDirectory类在内存中创建索引。添加一个文档到RAMDirectory很快，因为不需要进行磁盘I/O操作。一个文档一加入RAMDirectory便可以通过一个叫做IndexWriter的对象进行查询。这也解决了文档簇对于小文档集合的问题。然而，对于大小大于机器内存的文档集合，内存索引方式将不再可用，数据必须被写入磁盘，并根据用户对索引数据的定位方式来决定，哪些数据需要驻留内存，哪些数据应该写入磁盘。 在我们的工作中，当需要对文档进行快速存取时，Indri使用内存索引而不是批量索引。当需要同时处理查询请求的时候，Indri会立即决定什么时候该合并索引，而什么时候该把数据写入磁盘。 3．策略 3.1 内存结构 Indri采用两种类型的索引：内存索引和磁盘索引。内存索引驻留内存，而磁盘索引则存储在磁盘上。两种索引都能够处理查询，但只有内存索引能添加新的文档。Indri的磁盘索引结构是固定不变的，可以删除，但不能修改。 大多数的信息检索系统在磁盘上为所有文档创建一个单独的索引，而Indri在创建索引的同时还会生成不同用途的索引文件，这里我们使用&#8221;索引库repository&#8220;来指代一个文档集合对应的索引及其相关数据结构。 当在文本集合上创建索引库的时候，Indri把当前文档索引到活动的内存索引中。只要一个索引库处在打开写模式，就存在一个活动的内存索引准备接收文档。对于小的文档集合，索引数据直到索引库需要关闭的时候才写入磁盘。数据写入的同时，一个新的内存索引被创建，作为新的活动索引使用。 用上述方法构建一个索引相当于多个检索系统同时工作。文档通常一次性加入内存索引结构，只有当达到了内存限制时才被写入磁盘。在批量系统中，磁盘和内存结构不能独立工作，索引需要经过后续处理才能用于系统查询。 因为构建许多小的索引比构建一个单独的大索引更加高效，所以在磁盘上维持许多单独的小索引要更加有利。大的索引只有在需要合并小索引的时候才会出现，为了尽可能快地向系统中添加文档，简单地生成小的索引更有优势[7]。 然而，从一个单独的大索引中查询比在众多小索引中寻觅要快的多，主要的原因在于磁盘寻址时间，查询所需要的大量的磁盘寻址和索引数目之间具有线性的关系。因此，重查询负载的情况下，Indri将通过合并索引的方式来减少磁盘索引数目。 3.2 加锁机制 为了满足系统快速响应的需求，Indri必须快速地处理查询和加入文档。Indri以前的版本已经是一个有效的批量系统，可以很容易地对数据结构进行加锁，但是这也容易使查询或文档插入被长时间阻塞。为了保持快速的响应，必须确保系统中互斥锁的使用是在很短的时间内。 我们通过只允许活动内存索引可变来减少互斥锁的加锁时间。除了一些caches外，内存索引是系统加锁时需要处理的唯一结构。而内存索引的大小是受可用内存大小限制的，它的全部内容都驻留在内存中，即算对于复杂的查询，内存索引的响应也相当快。当然，加锁时间也可以通过减少内存限制来降低。 互斥锁也需要向内存索引中添加新的文档，为了减少加锁时间，我们确保每个文档在上锁前是被解析过的，只有当单个文档被索引的时候才加锁，然后开锁以处理查询。大多数的网页和新闻文档可以在小于1/100秒的时间内索引好，查询处理需要等待的时间相应也就很短。 当处理查询的时候有新文档到来，系统又正好处于加锁状态，此时，系统将禁止磁盘I/O操作。这样可以显著减少主线程在持有互斥锁的情形下任务调度混乱。 3.3 只读结构 为了减少系统在处理大卷数据时互斥锁加锁的时间，我们设法让大部分数据保持不变，对于磁盘数据来说这是基本准则。如果磁盘数据不是只读的，那么线程之间就需要在进行磁盘I/O读写时进行加锁，这可能导致加锁时数据的高度不一致。 既然磁盘数据在写入后便不允许更改，对于读取数据来说就不用加锁。加锁策略让Indri可以充分利用多处理器系统来提高性能。大部分的查询代码路径(query code path)仅仅需要只读锁就可以了，也就是说查询处理是高度并行的，尽管它看起来似乎受到磁盘子系统的并行限制。另外，由于文本解析和索引可以同时进行，索引过程也具有可并行性。 3.4 <a href='http://blog.zye.me/2011/06/5150.html'>[...]</a>]]></description>
			<content:encoded><![CDATA[<p style="TEXT-ALIGN: center"><span style="FONT-SIZE: 16pt"><span style="FONT-FAMILY: Times New Roman">Indri</span></span><span style="FONT-SIZE: 16pt; FONT-FAMILY: 宋体">中的动态文档索引技术</span> <span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">戴维</span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">译</span></p>
<p style="TEXT-ALIGN: center"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 宋体">摘要：</span> <span style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana">Indri</span> <span style="FONT-SIZE: 10pt; FONT-FAMILY: 宋体">动态文档索引的实现技术，支持在更新索引的同时处理用户在线查询请求。</span> <span style="FONT-SIZE: 10pt; FONT-FAMILY: 宋体">文本搜索引擎曾被设计为针对固定的文档集合进行查询，对不少应用来说，这种机制工作得很好，然而对于诸于新闻，财经和桌面搜索而言，需要的是高效、经常性的更新索引。</span> <span style="FONT-SIZE: 10pt; FONT-FAMILY: 宋体">以往支持动态文档集合的研究主要围绕增量索引方法，增量系统通过往已有的索引中追加大的文档集合来优化索引性能，但是不允许在增量索引的同时处理用户查询。</span> <span style="FONT-SIZE: 10pt; FONT-FAMILY: 宋体">与以往的增量系统不同，</span><span style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana">Indri</span><span style="FONT-SIZE: 10pt; FONT-FAMILY: 宋体">搜索引擎的最新版本支持动态文档集合，不需要通过加大文档集合大小来获取索引性能，同时</span><span style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana">Indri</span><span style="FONT-SIZE: 10pt; FONT-FAMILY: 宋体">支持索引和查询的并发，允许用户在增量索引的同时进行查询。</span></p>
<p><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">1</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">．介绍</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">尽管全文索引技术已经出现了几十年之久，但是直到互联网的出现，它才真正得到普及。现在，几乎每个互联网使用者都是搜索引擎用户，全文搜索技术被广泛地用于各种应用领域，如</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Web</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">搜索，新闻搜索，以及时下流行的桌面搜索等。</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">搜索桌面（或硬盘）文件和</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">e-mails</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">对大多数信息检索系统而言是一个新的挑战。用户期望他们的</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">e-mails</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">即到即索引，文件在保存到磁盘的顷刻便被索引好。永远不要期望桌面搜索用户能忍受由于索引更新所带来的存储消耗。不管是祈祷还是咒骂，用户看到了全文搜索的好处，更为普遍的事实是人们越来越发现自己离不开搜索。</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">然后，更新一个全文索引是一个耗时的过程，现在的搜索引擎通过建立大量文档的倒排索引表来创建索引，一个倒排索引项包含一篇文章的无重复词语列表，以及这些词语的附加信息（如词语在文档中的位置，词性等）。一篇短小的包含</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">100</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">个左右不重复词语的文章，索引的时候需要更新</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">100</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">个倒排索引项。如果这些倒排索引是存储在磁盘上的话，更新操作将需要</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">100</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">次的磁盘寻址，在现有的硬件配置下，这需要一秒或更长的时间。当然，长文章耗时也越长。另外，一个随之而来的问题是，在更新索引的时候，搜索引擎是该让用户等待更新完成还是继续处理用户的查询请求呢？</span></p>
<p><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman"> Indri</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">搜索引擎新的版本突破了上述限制。作为</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Lemur[1]</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">项目研发的一部分，</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Indri</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">支持结构化查询语言，采用语言建模方法</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">[2]</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">，同时为了满足问答系统的需要，</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Indri</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">还支持对结构化文档不同域进行查询。</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Indri</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">第一个版本没有加入对增量索引的支持，但在最新的版本中允许对单个文档进行真正的实时索引。</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">一个信息检索系统要处理动态的文档集合，需要解决三个关键问题。首先，要能快速地添加和删除集合中的文档，这里的快速取决于一个桌面搜索用户愿意花多长的时间来等待索引的完成，也就是说对于单个文档而言响应应该是瞬时的。其次，系统要允许查询在任何时候都能得到响应，即使是在新文档添加进文档集合的同时。另外，系统要实用，在索引和检索性能上比不支持动态文档集合的检索系统更具有竞争性。</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">在</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Indri</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">最新版本中，我们通过引入如下设计原则来实现上述目标：</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">内存结构-为了避免读写磁盘，尽可能长地把数据调入内存。</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">加锁机制-系统在尽可能小的时间段内对数据进行加锁互斥。</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">只读结构-为了减少对互斥锁的依赖，系统引入只读数据结构。</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">后台</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">I/O</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">-系统采用后台线程来和低速设备进行交互，以提高索引操作性能。</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">多版本结构-如果一个耗时长的操作需要获取可能已经更新的数据，系统将维持此数据的多个版本来减少互斥锁的使用。</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">上述原则的具体运用将在文章后续部分进行详细介绍。</span></p>
<p><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">2</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">．相关工作</span></p>
<p style="TEXT-ALIGN: left"><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">数据库研究团体已经花了几十年时间来研究数据获取并发技术。</span><span style="FONT-SIZE: 12pt; FONT-FAMILY: CMR10"><span style="FONT-FAMILY: Times New Roman">Ramakrishnan</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">和</span><span style="FONT-SIZE: 12pt; FONT-FAMILY: CMR10"><span style="FONT-FAMILY: Times New Roman">Gehrke</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">提出了一个通用的数据库原则</span><span style="FONT-SIZE: 12pt; FONT-FAMILY: CMR10"><span style="FONT-FAMILY: Times New Roman">[3]</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">，</span><span style="FONT-SIZE: 12pt; FONT-FAMILY: CMR10"><span style="FONT-FAMILY: Times New Roman">Gray</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">和</span><span style="FONT-SIZE: 12pt; FONT-FAMILY: CMR10"><span style="FONT-FAMILY: Times New Roman">Reuter</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">则进一步深入探讨了事务处理系统</span><span style="FONT-SIZE: 12pt; FONT-FAMILY: CMR10"><span style="FONT-FAMILY: Times New Roman">[4]</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">。</span></p>
<p style="TEXT-ALIGN: left"><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">尽管访问文档数据和访问数据库数据所遇到的问题类似，但它们之间仍然存在着显著的差别。在数据库系统中，用户特别关心的是数据的原子性，有一个经典得例子，一位银行顾客把</span><span style="FONT-SIZE: 12pt; FONT-FAMILY: CMR10"><span style="FONT-FAMILY: Times New Roman">d</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">美元钱从一个储蓄户头转到一个支票账号，如果</span><span style="FONT-SIZE: 12pt; FONT-FAMILY: CMR10"><span style="FONT-FAMILY: Times New Roman">d</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">美元先从储蓄户头减掉，那么这位顾客的总钱数就少了</span><span style="FONT-SIZE: 12pt; FONT-FAMILY: CMR10"><span style="FONT-FAMILY: Times New Roman">d</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">美元；如果先往支票账号加入</span><span style="FONT-SIZE: 12pt; FONT-FAMILY: CMR10"><span style="FONT-FAMILY: Times New Roman">d</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">美元，那么他的钱就凭空多了</span><span style="FONT-SIZE: 12pt; FONT-FAMILY: CMR10"><span style="FONT-FAMILY: Times New Roman">d</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">美元。不管怎么样，在第一个账号改变的同时就会出现金额不一致的情况。于是，数据库中事件并发研究的一个主要任务就是确保数据库的一致性，即使是在系统操作失败的情况下也要如此。</span></p>
<p style="TEXT-ALIGN: left"><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">在我们系统中可以确保文档的插入和删除是原子操作，没有用户会看到一个文档是部分被删除或者插入的。但我们不允许多个文档的插入和删除是原子的。既然文档之间很少像数据库中记录那样相互依赖，这就不会成为我们系统的主要限制。</span></p>
<p style="TEXT-ALIGN: left"><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">虽然有上述的差异，我们仍然可以从数据库中得到借鉴。异步</span><span style="FONT-SIZE: 12pt; FONT-FAMILY: CMR10"><span style="FONT-FAMILY: Times New Roman">I/O</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">和互斥技术被现代数据库系统广泛使用，我们也在索引系统中采用类似的多版本并发技术</span><span style="FONT-SIZE: 12pt; FONT-FAMILY: CMR10"><span style="FONT-FAMILY: Times New Roman">[5]</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">。</span></p>
<p style="TEXT-ALIGN: left"><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">信息检索研究团体没有忽略动态文档集合，他们把研究重点放在了增量系统上，这种系统通过一次性添加大批量文档到已有索引上来代替单个文档的高效添加。这种研究并没有考虑当更新索引时系统能否继续处理查询的问题。</span></p>
<p><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman"> Brown</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">，</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Callan</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">和</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Croft[6]</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">研究了一种高效增量索引的方法。他们区别对待小于</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">8k</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">和大于</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">8k</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">的索引，当一个小的索引需要增大时，它将被拷贝到一个大的连续的倒排索引文件中。然而对于一个大的索引项则不需要移动，只需添加一个前向指针到新的存储段</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">(segment)</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">里面。这也使得倒排索引可以通过倒排索引文件串连起来。他们发现，当在</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">7</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">个簇中创建一个索引的时候，查询性能降低了</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">6%</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">。使用小的簇时代价偏高，在他们的模型中索引每簇大小为</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">64M</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">的文本所花的时间是索引每簇大小为</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">1M</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">的文本的</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">8</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">倍。</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">在最近的研究中，</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Lester</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">，</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Zobel</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">以及</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Williams[7]</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">比较了三种索引策略：占位</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">(in-place)</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">，合并（</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">re-merge</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">）以及重构（</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">rebuild</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">）。除了没有对连接链表进行优化外，</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">in-place</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">策略类似</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Brown</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">所采用的方法。所有的倒排索引连续存放，如果没有足够的空间写入新数据的时候，已有数据必须被拷贝到别的地方。在</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">re-merge</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">策略中，新的文档簇被创建到单独的索引中，然后和已经存在的索引进行合并。</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">rebuild</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">策略则对已经构建的索引弃之不顾，在原始文档的基础上重新构建索引。他们研究发现，</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">re-merge</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">策略是最高效的更新索引的方法。但是，他们没有像</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Brow</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">，</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Callan</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">和</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Croft</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">所作的那样，对预分配</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">(pre-allocation)</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">策略之间和处理大索引策略之间的差异进行比较。</span></p>
<p><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman"> Lester</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">，</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Zobel</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">和</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Williams</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">提到，在使用最小的文档簇</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">(10</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">个文档</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">)</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">的情况下，表现最好的索引策略</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">(in-place</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">策略</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">)</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">在大约</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">7</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">秒的时间内更新了</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">1G</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">的索引。相对于别的策略而言，这已经是非常快了，但是，对于单个文档的索引更新来说，这并不是个理想的策略。</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">本文描述的方法类似</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Lucene</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">搜索引擎，正常情况下，就像传统的批量索引一样，</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Lucene</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">以分段</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">(segment)</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">的方式把数据写入磁盘。一旦数据被写入段中，他们就可以被查询到，并且不需要进行段的合并。这和</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Brown</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">，</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Callan</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">和</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Croft</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">的连接链表方法有点类似，只是，把数据写入簇</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">(batch)</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">中需要更多的开销</span><span style="FONT-SIZE: 0.9em"><span style="FONT-FAMILY: 宋体">。</span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">为了获得更好的性能，许多文档必须被写入磁盘的一个簇</span></span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">(batch)</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">中。</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">如果需要快速的响应，</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Lucene</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">提供一个</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">RAMDirectory</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">类在内存中创建索引。添加一个文档到</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">RAMDirectory</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">很快，因为不需要进行磁盘</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">I/O</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">操作。一个文档一加入</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">RAMDirectory</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">便可以通过一个叫做</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">IndexWriter</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">的对象进行查询。这也解决了文档簇对于小文档集合的问题。然而，对于大小大于机器内存的文档集合，内存索引方式将不再可用，数据必须被写入磁盘，并根据用户对索引数据的定位方式来决定，哪些数据需要驻留内存，哪些数据应该写入磁盘。</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">在我们的工作中，当需要对文档进行快速存取时，</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Indri</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">使用内存索引而不是批量索引。当需要同时处理查询请求的时候，</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Indri</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">会立即决定什么时候该合并索引，而什么时候该把数据写入磁盘。</span></p>
<p><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">3</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">．策略</span></p>
<p><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">3.1</span></span> <span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">内存结构</span></p>
<p><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman"> Indri</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">采用两种类型的索引：内存索引和磁盘索引。内存索引驻留内存，而磁盘索引则存储在磁盘上。两种索引都能够处理查询，但只有内存索引能添加新的文档。</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Indri</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">的磁盘索引结构是固定不变的，可以删除，但不能修改。</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">大多数的信息检索系统在磁盘上为所有文档创建一个单独的索引，而</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Indri</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">在创建索引的同时还会生成不同用途的索引文件，这里我们使用&#8221;索引库</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">repository</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">&#8220;来指代一个文档集合对应的索引及其相关数据结构。</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">当在文本集合上创建索引库的时候，</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Indri</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">把当前文档索引到活动的内存索引中。只要一个索引库处在打开写模式，就存在一个活动的内存索引准备接收文档。对于小的文档集合，索引数据直到索引库需要关闭的时候才写入磁盘。数据写入的同时，一个新的内存索引被创建，作为新的活动索引使用。</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">用上述方法构建一个索引相当于多个检索系统同时工作。文档通常一次性加入内存索引结构，只有当达到了内存限制时才被写入磁盘。在批量系统中，磁盘和内存结构不能独立工作，索引需要经过后续处理才能用于系统查询。</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">因为构建许多小的索引比构建一个单独的大索引更加高效，所以在磁盘上维持许多单独的小索引要更加有利。大的索引只有在需要合并小索引的时候才会出现，为了尽可能快地向系统中添加文档，简单地生成小的索引更有优势</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">[7]</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">。</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">然而，从一个单独的大索引中查询比在众多小索引中寻觅要快的多，主要的原因在于磁盘寻址时间，查询所需要的大量的磁盘寻址和索引数目之间具有线性的关系。因此，重查询负载的情况下，</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Indri</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">将通过合并索引的方式来减少磁盘索引数目。</span></p>
<p><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">3.2</span></span> <span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">加锁机制</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">为了满足系统快速响应的需求，</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Indri</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">必须快速地处理查询和加入文档。</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Indri</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">以前的版本已经是一个有效的批量系统，可以很容易地对数据结构进行加锁，但是这也容易使查询或文档插入被长时间阻塞。为了保持快速的响应，必须确保系统中互斥锁的使用是在很短的时间内。</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">我们通过只允许活动内存索引可变来减少互斥锁的加锁时间。除了一些</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">caches</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">外，内存索引是系统加锁时需要处理的唯一结构。而内存索引的大小是受可用内存大小限制的，它的全部内容都驻留在内存中，即算对于复杂的查询，内存索引的响应也相当快。当然，加锁时间也可以通过减少内存限制来降低。</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体"><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">互斥锁也需要向内存索引中添加新的文档，为了减少加锁时间，我们确保每个文档在上锁前是被解析过的，只有当单个文档被索引的时候才加锁，然后开锁以处理查询。大多数的网页和新闻文档可以在小于</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">1/100</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">秒的时间内索引好，查询处理需要等待的时间相应也就很短。</span></span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">当处理查询的时候有新文档到来，系统又正好处于加锁状态，此时，系统将禁止磁盘</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">I/O</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">操作。这样可以显著减少主线程在持有互斥锁的情形下任务调度混乱。</span></p>
<p><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">3.3</span></span> <span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">只读结构</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">为了减少系统在处理大卷数据时互斥锁加锁的时间，我们设法让大部分数据保持不变，对于磁盘数据来说这是基本准则。如果磁盘数据不是只读的，那么线程之间就需要在进行磁盘</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">I/O</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">读写时进行加锁，这可能导致加锁时数据的高度不一致。</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">既然磁盘数据在写入后便不允许更改，对于读取数据来说就不用加锁。加锁策略让</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Indri</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">可以充分利用多处理器系统来提高性能。大部分的查询代码路径</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">(query code path)</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">仅仅需要只读锁就可以了，也就是说查询处理是高度并行的，尽管它看起来似乎受到磁盘子系统的并行限制。另外，由于文本解析和索引可以同时进行，索引过程也具有可并行性。</span></p>
<p><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">3.4</span></span> <span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">后台</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">I/O</span></span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">如果</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">I/O</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">操作</span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">在查询和索引时候不能执行，就需要引入异步</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">I /O</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">。</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Indri</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">通过一直运行如下两个线程来实现异步</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">I/O</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">操作：</span></p>
<p><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">RepositoryMaintenanceThread</span></span></p>
<p><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">RepositoryLoadThread</span></span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">这两个线程和特定的</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Repository</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">相关联，如果多于一个</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Repository</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">被打开，每个</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">repository</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">均需要对应的这样一对线程相关联。</span></p>
<p><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman"> RepositoryLoadThread</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">执行两项任务。一是为查询和新加入的文档载入统计数据，这种数据载入和</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Unix</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">进程载入有些类似，线程标明在过去的</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">1</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">，</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">5</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">以及</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">15</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">分钟内处理的查询和添加的文档数目，以帮助系统决定何时该把内存数据写入磁盘。</span></p>
<p><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman"> RepositoryLoadThread</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">的另一项任务是检查系统的内存使用情况。如果系统使用的内存超过用户限制的</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">25</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">％，</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">RepositoryLoadThread</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">将挂起所有文档索引线程直到内存占用降下来为止，这可以防止系统在大批量文档加入的时候崩溃。对于多数可能的实时程序，如新闻播报，系统的运行决不能超出内存限制。</span></p>
<p><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman"> RepositoryMaintenanceThread</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">把索引写入磁盘，它是唯一能把索引数据写入磁盘，并可以从磁盘删除数据的线程，在这个线程中不需要复杂的加锁机制。该线程每分钟激活</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">5</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">次以检查系统当前内存占用量，如果系统使用了过多的内存，它就开始把内存数据写入磁盘。</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">如上所述，创建新的磁盘索引并不是总有好处，因为许多小的磁盘索引结构对于大的磁盘索引结构来说需要更多的查询时间。为此，索引库维护线程</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">RepositoryMaintenaceThread</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">在写磁盘之前检查查询和文档的载入情况，如果查询相对于文档载入量更大的话，维护线程将进行索引合并而不是往磁盘中写入一个新的索引。</span></p>
<p><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">3.5</span></span> <span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">多版本结构</span></p>
<p><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman"> Indri Repository</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">可能包含多个索引，</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Indri</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">维护一个称之为</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">index_state</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">的结构，这个结构持有指向当前索引库中所有索引的指针。索引数据在两种情况下被写入磁盘：</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">内存索引</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">MemoryIndex</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">已经达到了它的内存限制；</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">存在过多的内存索引</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">MemoryIndex</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">，他们需要进行合并。</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">在上述两种情况下，</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Indri</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">把数据写入磁盘，这些数据可能已经以别的形式存在于系统中，因此</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">index_state</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">需要进行修改以反映数据是否被删除。</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">一个解决办法是对</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">index_state</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">结构的读写进行互斥，然而，这种方法也可能导致在重负载情况下系统性能低下。</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">考虑</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Indri</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">是运行在一个并行系统中，并且用户正在进行复杂的查询，这些查询每个都需要</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">10</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">分钟的处理时间。假如用户在两个独立的线程中提交查询，系统中就总是有两份同样的查询在运行。如果这些查询以</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">5</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">分钟为时间片轮流执行，</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">A</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">线程分别在一个小时的</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">0</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">，</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">10</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">，</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">20</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">，</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">30</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">，</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">40</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">和</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">50</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">分钟时开始运行，而</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">B</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">线程则分别在</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">5</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">，</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">15</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">，</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">25</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">，</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">35</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">，</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">45</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">和</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">55</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">分钟时开始执行。在一个小时开始的时候，我们对</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">index_state</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">的写进行加锁，当在</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">B</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">线程处理完一个查询前，</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">A</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">线程不允许处理它的下一个的查询，于是一个处理器将有</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">5</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">分钟的空闲时间，这是我们不希望看到的。</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">为了避免上述情况的出现，</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Indri</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">在同一时间维持多个</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">index_state</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">结构，所有新的任务（如新的查询，文档的加入）使用新的</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">index_state</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">结构，而旧的任务继续使用旧的</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">index_state</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">结构，当没有用户需要使用</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">index_state</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">的时候，它将被删除。</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">在上面的例子中，这意味着线程</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">B</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">在使用它旧的</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">index_state</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">结构完成它的查询处理的同时，线程</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">A</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">使用新的</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">index_state</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">结构开始处理它的下一个查询。当线程</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">B</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">处理完毕当前查询，旧的</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">index_state</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">将不在有用户使用，从而被删除掉。</span></p>
<p><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">4</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">．删除文档</span></p>
<p><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman"> Indri</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">支持删除标记。删除标记是一种弱删除方式，只是简单地隐藏文档对于用户的可见性，而不是真正的删除。文档对应的索引数据并不会真正从倒排链表，有向链表或者压缩集合中删除掉，也就是说文档中词语的计数仍然保留在语料统计数据库中。</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">假设我们有一个文档集合</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">A</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">，以及它的一个子集</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">B</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">，首先创建</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">A</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">的索引</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">I</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">，然后从</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">I</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">中删除</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">B</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">对应的索引。我们只是通过把文档集合</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">A</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">－</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">B</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">添加到</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">I&#8217;</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">来创建一个相似的索引</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">I&#8217;</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">。由于包含了文档集</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">B</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">的数据，索引</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">I</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">比索引</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">I&#8217;</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">需要占用更多的磁盘空间。进一步，因为</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">I</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">和</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">I&#8217;</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">对应的语料统计库稍有差别，当在这两个索引上进行查询时，查询结果也会有所不同。基于如上原因，当使用</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Indri</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">进行搜索的时候要谨慎地使用文档删除特性。</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">尽管实际应用中，删除是个很有用的特性，但单纯的删除用处不大。删除往往被用于进行文档更新（删除旧的版本，插入新的版本）。对于桌面搜索或新闻搜索而言，经常需要更新已有文档的错误（或者过时）版本，这就显得尤为重要。</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">我们采用一个简单的位图来标记文档的删除，当一个文档需要删除，就为其设置对应的比特位，任何不在位图中的比特位均假设没有被设置。因此，如果没有文档被删除，位图文件将是一个空文件。这个文件会一直扩充直到最后一个比特位被设置为非</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">0</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">。</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">查询时，每个文档均要在打分前对照位图进行检查，只有没有被标记为删除的文档才能进行查询计分。</span></p>
<p><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">5</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">．总结</span></p>
<p><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman"> Indri</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">现在可以在小于</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">1</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">秒的短时间片内完成文档索引并立即用于查询，这使得高速、并发访问新索引的文档所付出的代价足够小，以至</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Indri</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">不需要采用特殊的批量和增量模式。</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Indri</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">可以每小时索引约</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">15G</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">的</span><span style="FONT-SIZE: 12pt"><span style="FONT-FAMILY: Times New Roman">Web</span></span><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">数据，包括压缩和存储每个原始文档。</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">在这种性能下，我们已经实现了适合新闻过滤以及桌面搜索应用的检索系统，我们相信这是第一个具有如此高性能的开源系统。</span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体"><em>Copyright@戴维 2005.8 于北京</em></span></p>
<p><span style="FONT-SIZE: 12pt; FONT-FAMILY: 宋体">参考文献：</span></p>
<p style="TEXT-ALIGN: left"><span style="FONT-SIZE: 0.9em"><span style="FONT-FAMILY: Times New Roman"><span style="FONT-FAMILY: CMR8">[1] Trevor Strohman, Donald Metzler, Howard Turtle, and W. Bruce Croft,</span> <span style="FONT-FAMILY: CMTI8">Indri: A language model-based serach engine for complex queries</span><span style="FONT-FAMILY: CMR8">, IA 2005: Proceedings of the 2nd International Conference on Intelligence Analysis (to appear), 2005.</span></span></span></p>
<p style="TEXT-ALIGN: left"><span style="FONT-SIZE: 0.9em"><span style="FONT-FAMILY: Times New Roman"><span style="FONT-FAMILY: CMR8">[2] Donald Metzler, Victor Lavrenko, and W. Bruce Croft,</span> <span style="FONT-FAMILY: CMTI8">Formal multiple-bernoulli models for language modeling</span><span style="FONT-FAMILY: CMR8">, Proceedings of ACM SIGIR 2004, 2004, pp. 540-541.</span></span></span></p>
<p style="TEXT-ALIGN: left"><span style="FONT-SIZE: 0.9em"><span style="FONT-FAMILY: Times New Roman"><span style="FONT-FAMILY: CMR8">[3] Raghu Ramakrishnan and Johannes Gehrke,</span> <span style="FONT-FAMILY: CMTI8">Database management systems</span><span style="FONT-FAMILY: CMR8">, McGraw-Hill Higher Education, 2000.</span></span></span></p>
<p style="TEXT-ALIGN: left"><span style="FONT-SIZE: 0.9em"><span style="FONT-FAMILY: Times New Roman"><span style="FONT-FAMILY: CMR8">[4] Jim Gray and Andreas Reuter,</span> <span style="FONT-FAMILY: CMTI8">Transaction processing: Concepts and techniques</span><span style="FONT-FAMILY: CMR8">, Morgan Kaufmann, 1993.</span></span></span></p>
<p style="TEXT-ALIGN: left"><span style="FONT-SIZE: 0.9em"><span style="FONT-FAMILY: Times New Roman"><span style="FONT-FAMILY: CMR8">[5] Philip A. Bernstein and Nathan Goodman,</span> <span style="FONT-FAMILY: CMTI8">Multiversion concurrency control˙theory and algorithms</span><span style="FONT-FAMILY: CMR8">, ACM Trans. Database Syst.</span> <span style="FONT-FAMILY: CMBX8">8</span> <span style="FONT-FAMILY: CMR8">(1983), no. 4, 465-483.</span></span></span></p>
<p style="TEXT-ALIGN: left"><span style="FONT-SIZE: 0.9em"><span style="FONT-FAMILY: Times New Roman"><span style="FONT-FAMILY: CMR8">[6] Eric W. Brown,</span> <span style="FONT-FAMILY: CMTI8">Fast evaluation of structured queries for information retrieval</span><span style="FONT-FAMILY: CMR8">,SIGIR&#8217;95:Proceedings of the 18th annual international ACM SIGIRconference onResearch and development in information retrieval (NewYork, NY, USA),ACM Press, 1995, pp. 30-38.</span></span></span></p>
<p style="TEXT-ALIGN: left"><span style="FONT-SIZE: 0.9em"><span style="FONT-FAMILY: Times New Roman"><span style="FONT-FAMILY: CMR8">[7] Nicholas Lester, Justin Zobel, and Hugh E. Williams,</span> <span style="FONT-FAMILY: CMTI8">In-place versus re-build versus re-merge: index maintenance strategies for text retrieval systems</span><span style="FONT-FAMILY: CMR8">, Proceedings of the 27th conference on Australasian computer science, Australian Computer Society, Inc., 2004, pp. 15-23.</span></span></span></p>
<p>相关链接：<br /><a href="http://newhaven.lti.cs.cmu.edu/indri/" rel="nofollow"><span style="COLOR: #0e61b2">http://newhaven.lti.cs.cmu.edu/indri/</span></a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.zye.me/2011/06/5150.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>IJCNLP 2008 Accepted Papers &#8211; Main Conference</title>
		<link>http://blog.zye.me/2011/05/3352.html</link>
		<comments>http://blog.zye.me/2011/05/3352.html#comments</comments>
		<pubDate>Wed, 11 May 2011 14:29:21 +0000</pubDate>
		<dc:creator>yezheng</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[2008]]></category>
		<category><![CDATA[IJCNLP]]></category>
		<category><![CDATA[information Retrieval]]></category>
		<category><![CDATA[NLP]]></category>
		<category><![CDATA[papers]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[信息检索]]></category>

		<guid isPermaLink="false">http://jeffye.yo2.cn/articles/ijcnlp-2008-accepted-papers-main-conference.html</guid>
		<description><![CDATA[Oral Presentation (75 papers) Context-Sensitive Convolution Tree Kernel for Pronoun Resolution GuoDong ZHOU and Fang KONG 苏州大学 周国栋 孔芳 (by cyy98) Semi-Supervised Learning for Relation Extraction GuoDong ZHOU, LongHua QIAN and QiaoMing ZHU 苏州大学 周国栋 钱龙华 朱巧明 (by cyy98) Learning a Stopping Criterion for Active Learning for Word Sense Disambiguation and Text Classification Jingbo Zhu, <a href='http://blog.zye.me/2011/05/3352.html'>[...]</a>]]></description>
			<content:encoded><![CDATA[<p>Oral Presentation (75 papers)</p>
<p>Context-Sensitive Convolution Tree Kernel for Pronoun Resolution</p>
<p>GuoDong ZHOU and Fang KONG<br />
苏州大学 周国栋 孔芳 (by cyy98)</p>
<p>Semi-Supervised Learning for Relation Extraction</p>
<p>GuoDong ZHOU, LongHua QIAN and QiaoMing ZHU<br />
苏州大学 周国栋 钱龙华 朱巧明 (by cyy98)</p>
<p>Learning a Stopping Criterion for Active Learning for Word Sense Disambiguation and Text Classification</p>
<p>Jingbo Zhu, Huizhen Wang and Eduard Hovy</p>
<p>A Semantic Feature for Relation Recognition Using a Web-based Corpus</p>
<p>ChenMing Hung</p>
<p>Formalising Multi-layer Corpora in OWL DL &#8211; Lexicon Modelling, Querying and Consistency Control</p>
<p>Aljoscha Burchardt, Sebastian Pado, Dennis Spohr, Anette Frank and Ulrich Heid</p>
<p>A Lemmatization Method for Modern Mongolian and its Application to Information Retrieval</p>
<p>Badam-Osor Khaltar and Atsushi Fujii</p>
<p>An Empirical Comparison of Goodness Measures for Unsupervised Chinese Word Segmentation with a Unified Framework</p>
<p>Hai Zhao and Chunyu Kit<br />
香港城市大学</p>
<p>Memory-Inductive Categorial Grammar: An Approach to Gap Resolution in Analytic-Language Translation</p>
<p>Prachya Boonkwan and Thepchai Supnithi</p>
<p>Identifying Cross-Document Relations between Sentences</p>
<p>Yasunari Miyabe, Hiroya Takamura and Manabu Okumura</p>
<p>Answering Definition Questions via Temporally-Anchored Text Snippets</p>
<p>Marius Pasca</p>
<p>Projection-based Acquisition of a Temporal Labeller</p>
<p>Kathrin Spreyer and Anette Frank</p>
<p>Story Link Detection based on Dynamic Information Extending</p>
<p>Xiaoyan Zhang, Ting Wang and Huowang Chen<br />
国防科大 陈火旺</p>
<p>Bootstrapping Both Product Features and Opinion Words from Chinese Reviews with Cross-Inducing</p>
<p>Bo Wang and Houfeng Wang<br />
北大计算语言所 王厚峰</p>
<p>Identify Temporal Websites Based on User Behavior Analysis</p>
<p>Yong Wang, Yiqun Liu, Min Zhang, Shaoping Ma and Liyun Ru<br />
清华大学 刘奕群 张敏 马少平</p>
<p>Effective Compositional Model for Lexical Alignment</p>
<p>Béatrice Daille and Emmanuel Morin</p>
<p>Orthographic Disambiguation Incorporating Transliterated Probability</p>
<p>Eiji ARAMAKI, Takeshi IMAI, Kengo MIYO and Kazuhiko OHE</p>
<p>Automatic Estimation of Word Significance oriented for Speech-based Information Retrieval</p>
<p>Takashi SHICHIRI, Hiroaki NANJO and Takehiko YOSHIMI</p>
<p>Determining the Unithood of Word Sequences using a Probabilistic Approach</p>
<p>Wilson Wong, Wei Liu and Mohammed Bennamoun</p>
<p>Multi-View Co-Training of Transliteration Model</p>
<p>Jin-Shea Kuo and Haizhou Li</p>
<p>Computing Paraphrasability of Syntactic Variants using Web Snippets</p>
<p>Atsushi Fujita and Satoshi Sato</p>
<p>Context Feature Selection for Distributional Similarity</p>
<p>Masato Hagiwara, Yasuhiro Ogawa and Katsuhiko Toyama</p>
<p>Lexical Chains as Document Features</p>
<p>Dinakar Jayarajan, Dipti Deodhare and Ravindran Balaraman</p>
<p>Combining Resources with Confidence Measures for Cross Language Information Retrieval</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.zye.me/2011/05/3352.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>跨语言信息检索综述</title>
		<link>http://blog.zye.me/2011/05/3259.html</link>
		<comments>http://blog.zye.me/2011/05/3259.html#comments</comments>
		<pubDate>Wed, 11 May 2011 02:30:29 +0000</pubDate>
		<dc:creator>yezheng</dc:creator>
				<category><![CDATA[information Retrieval]]></category>
		<category><![CDATA[信息检索]]></category>
		<category><![CDATA[跨语言检索]]></category>

		<guid isPermaLink="false">http://jeffye.yo2.cn/articles/%e8%b7%a8%e8%af%ad%e8%a8%80%e4%bf%a1%e6%81%af%e6%a3%80%e7%b4%a2%e7%bb%bc%e8%bf%b0-%e4%b8%8d%e6%96%ad%e6%9b%b4%e6%96%b0%e4%b8%ad.html</guid>
		<description><![CDATA[1 . 一篇关于英文综述（2005）&#8211;focus on current approaches to CLIR systems. literature-review-of-cross-language-information-retrieval.pdf 2. 一篇关于Dictionary-based approach 的文章，文章记录作者参加CLEF2000-2002（主要是欧洲语言交叉）评测以及实际系统开发中遇到的一系列问题，里面有一些实验结果值得一读，但这篇文章感觉描述得不是很清楚, 但不管怎么说，能给我提供一些可信的参考。Paper title： Dictionary-Based Cross-Language Information Retrieval: Learning Experiences from CLEF 2000-2002 Technorati : 跨语言检索   in progress]]></description>
			<content:encoded><![CDATA[<p>1 . 一篇关于英文综述（2005）&#8211;focus on current approaches to CLIR systems. <a title="literature-review-of-cross-language-information-retrieval.pdf" href="http://jeffye.yo2.cn/wp-content/uploads/192/19263/2008/02/literature-review-of-cross-language-information-retrieval.pdf">literature-review-of-cross-language-information-retrieval.pdf</a></p>
<p>2. 一篇关于Dictionary-based approach 的文章，文章记录作者参加CLEF2000-2002（主要是欧洲语言交叉）评测以及实际系统开发中遇到的一系列问题，里面有一些实验结果值得一读，但这篇文章感觉描述得不是很清楚, 但不管怎么说，能给我提供一些可信的参考。Paper title： Dictionary-Based Cross-Language Information Retrieval: Learning Experiences from CLEF 2000-2002</p>
<p class="zoundry_raven_tags"><!-- Tag links generated by Zoundry Raven. Do not manually edit. http://www.zoundryraven.com --> <span class="ztags"><span class="ztagspace">Technorati</span> : <a class="ztag" rel="tag" href="http://technorati.com/tag/%E8%B7%A8%E8%AF%AD%E8%A8%80%E6%A3%80%E7%B4%A2">跨语言检索</a></span></p>
<p class="zoundry_raven_tags"> </p>
<p class="zoundry_raven_tags"><span class="ztags">in progress</span></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.zye.me/2011/05/3259.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>学术报告-检索相关</title>
		<link>http://blog.zye.me/2011/04/9313.html</link>
		<comments>http://blog.zye.me/2011/04/9313.html#comments</comments>
		<pubDate>Wed, 27 Apr 2011 14:27:56 +0000</pubDate>
		<dc:creator>yezheng</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[information Retrieval]]></category>
		<category><![CDATA[NLP]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[信息检索]]></category>
		<category><![CDATA[报告]]></category>

		<guid isPermaLink="false">http://www.5yiso.cn/2007/02/9313.html</guid>
		<description><![CDATA[学术报告-检索相关 2007 2007-10-30-黄玉兰- 工作汇总与有意义串研究 2007-10-30-王小磊- Semi-supervised 2007-6-5- Christos Faloutsos &#8211; Data Mining using Fractals and Power laws (Carnegie Mellon University) 2007-3-23-李明- Information Distance From a Question to an Answer ( 加拿大Waterloo大学) 2007-3-22-米海涛-条件随机场(CRF) 2006 2006-11-30-吴高巍-Overview of Supervised Learning 2006-11-9- 程学旗 -社会信息网络模型与应用思考( YOCSEF 邀请报告) 2006-9-29-XindongWu- How to Write and Publish Research Papers ( University of Vermont USA <a href='http://blog.zye.me/2011/04/9313.html'>[...]</a>]]></description>
			<content:encoded><![CDATA[<h1>学术报告-检索相关</h1>
<div class="u_bd">
<table cellpadding="0" border="0">
<tbody>
<tr>
<td style="BACKGROUND: green 0% 50%" height="27" valign="top">
<p><a href="http://www.searchforum.org.cn/seminar/lectures/index.html#index"><span><strong><span style="FONT-SIZE: 10pt; COLOR: yellow">2007</span></strong></span></a></p>
</td>
</tr>
<tr>
<td valign="top">
<ul>
<li>2007-10-30-黄玉兰- <a href="http://www.searchforum.org.cn/seminar/lectures/2007_10_30_huangyulan.ppt"><span style="COLOR: #22148d">工作汇总与有意义串研究</span></a></li>
<li>2007-10-30-王小磊- <a href="http://www.searchforum.org.cn/seminar/lectures/2007_10_30_wangxiaolei_OverviewSemiSup.ppt"><span style="COLOR: #22148d">Semi-supervised</span></a></li>
<li>2007-6-5- Christos Faloutsos &#8211; <a href="http://www.searchforum.org.cn/seminar/lectures/ict-ac.ppt"><span style="COLOR: #22148d">Data Mining using Fractals and Power laws</span></a> (Carnegie Mellon University)</li>
<li>2007-3-23-李明- <a href="http://www.searchforum.org.cn/seminar/lectures/2007_3_23_Dist-QU-AN.ppt"><span style="COLOR: #22148d">Information Distance From a Question to an Answer</span></a> ( 加拿大Waterloo大学)</li>
<li><span>2007-3-22-米海涛-<a style="COLOR: rgb(208,82,68); TEXT-DECORATION: underline" href="http://www.searchforum.org.cn/seminar/lectures/2007.3.22_CRFs_Mihaitao.ppt">条件随机场(CRF)</a></span></li>
</ul>
</td>
</tr>
<tr>
<td style="BACKGROUND: green 0% 50%" height="27" valign="top">
<p><a href="http://www.searchforum.org.cn/seminar/lectures/index.html#index"><span><strong><span style="FONT-SIZE: 10pt; COLOR: yellow">2006</span></strong></span></a></p>
</td>
</tr>
<tr>
<td valign="top">
<ul>
<li><span>2006-11-30-吴高巍-<a style="COLOR: rgb(208,82,68); TEXT-DECORATION: underline" href="http://www.searchforum.org.cn/seminar/lectures/2006-11-24WuGaowei-Overview%20of%20Supervised%20Learning.ppt">Overview of Supervised Learning</a></span></li>
<li><span>2006-11-9-</span> <span>程学旗</span> <span><span style="FONT-SIZE: 0.75em">-<a style="COLOR: rgb(208,82,68); TEXT-DECORATION: underline" href="http://www.searchforum.org.cn/seminar/lectures/2006-11-9ChengXueqi-YOCSEF.pdf">社会信息网络模型与应用思考</a>( YOCSEF</span></span> <span style="FONT-SIZE: 0.75em">邀请报告)</span></li>
<li><span style="FONT-SIZE: 0.75em"><span>2006-9-29-XindongWu- <a href="http://www.searchforum.org.cn/seminar/lectures/2006-9-29-XindongWu-How%20to%20Write%20and%20Publish%20Research%20Papers.ppt"><span style="COLOR: #22148d">How to Write and Publish Research Papers</span></a> ( University of Vermont USA )</span></span></li>
<li><span style="FONT-SIZE: 0.75em"><span>2006-9-25-HangLi- <a href="http://www.searchforum.org.cn/seminar/lectures/2006-9-25-HangLi-Statistical%20Learning%20Methods%20for%20Information%20Retrieval.ppt"><span style="COLOR: #22148d">Statistical Learning Methods for Information Retrieval</span></a></span> <span>(MSRA)</span></span></li>
<li><span style="FONT-SIZE: 0.75em"><span>2006-9-25-JirongWen- <a href="http://www.searchforum.org.cn/seminar/lectures/2006-9-25-JirongWen-Search%20Engine%20Overview.PDF"><span style="COLOR: #22148d">Search Engine Overview</span></a></span> <span>(MSRA)</span></span></li>
<li><span style="FONT-SIZE: 0.75em"><span>2006-8-1</span> <span>7</span> <span>-</span> <span>程学旗- <a href="http://www.searchforum.org.cn/seminar/lectures/2006-8-17-XueqiCheng-Web%20Search%20and%20Web%20Mining--Thinking%20about%20Large%20content%20computing%20in%20the%20Web.ppt"><span style="COLOR: #22148d">Web Search and Web Mining&#8211;Thinking about Large content computing in the Web(ICT)</span></a></span></span></li>
<li><span style="FONT-SIZE: 0.75em"><span>2006-9-</span> <span>7</span> <span>-</span> <span>刘奕群</span> <span>- <a href="http://www.searchforum.org.cn/seminar/lectures/2006-9-7%C3%83%C2%83%C3%82%C2%A5%C3%83%C2%82%C3%82%C2%88%C3%83%C2%82%C3%82%C2%98%C3%83%C2%83%C3%82%C2%A5%C3%83%C2%82%C3%82%C2%A5%C3%83%C2%82%C3%82%C2%95%C3%83%C2%83%C3%82%C2%A7%C3%83%C2%82%C3%82%C2%BE%C3%83%C2%82%C3%82%C2%A4-QueryTypeIdentification.ppt"><span style="COLOR: #22148d">找出用户查询背后的小算盘</span></a> (</span>清华大学<span>)</span></span></li>
<li><span style="FONT-SIZE: 0.75em"><span>2006-9-7</span> <span>-</span>岑荣伟<span>- <a href="file:."><span style="COLOR: #22148d">软件搜索</span></a> (</span>清华大学<span>)</span></span></li>
<li><span style="FONT-SIZE: 0.75em"><span>2006-9-7</span> <span>-</span>富羽鹏<span>- <a href="http://www.searchforum.org.cn/seminar/lectures/2006-9-7FuYupeng-A%20PDD%20Approach%20for%20Expert%20Finding.ppt"><span style="COLOR: #22148d">A PDD Approach for Expert Finding</span></a> (</span>清华大学<span>)</span></span></li>
<li><span style="FONT-SIZE: 0.75em"><span>2006-7-06</span> <span>-</span>王思力<span>- <a href="file:."><span style="COLOR: #22148d"><span><span>面向大规模信息检索的中文分词技术研究</span></span></span></a></span></span></li>
<li><span>2006-6-23</span> <span>-</span>曹冬林<span>- <a href="file:."><span style="COLOR: #22148d"><span><span>文本压缩</span></span></span></a></span></li>
<li><span>2006-6-21</span> <span>-</span>郭瑞杰<span>- <a href="file:."><span style="COLOR: #22148d">FirteX－高性能全文索引和检索平台</span></a> (</span>清华交流报告<span>)</span></li>
<li><span>2006-5-24</span> <span>-</span>张俊、彭朝晖<span>- <a href="http://www.searchforum.org.cn/seminar/lectures/2006-5-24-Jun%20Zhang-High%20Performance%20Database%20Lab(Renmin%20University%20of%20China).ppt"><span style="COLOR: #22148d">High Performance Database Lab(Renmin University of China)</span></a></span></li>
<li><span>2006-5-18</span> <span>-</span>唐慧丰<span>- <a href="file:."><span><span><span style="COLOR: #22148d">遗传算法原理与应用</span></span></span></a></span></li>
<li><span>2006-4-20</span> <span>-</span>张瑾<span>- <a href="http://www.searchforum.org.cn/seminar/lectures/2006-4-20%20zhangjin-Research%20on%20Chinese%20Automatic%20Summarization.ppt"><span style="COLOR: #22148d">Research on Chinese Automatic Summarization</span></a></span></li>
<li><span>2006-1-11</span> <span>-</span>张刚<span>- <a href="file:."><span><span><span style="COLOR: #22148d">分布式web信息检索技术研究</span></span></span></a></span></li>
<li><span>2006-1-05</span> <span>-</span>段建国<span>- <a href="file:."><span><span><span style="COLOR: #22148d">文本分类的信息论模型</span></span></span></a></span></li>
</ul>
</td>
</tr>
<tr>
<td style="BACKGROUND: green 0% 50%" valign="top">
<p><a href="http://www.searchforum.org.cn/seminar/lectures/index.html#index"><span><strong><span style="FONT-SIZE: 10pt; COLOR: yellow">2005</span></strong></span></a></p>
</td>
</tr>
<tr>
<td valign="top">
<ul>
<li><span style="FONT-SIZE: 10pt">2005-12-8</span> <span style="FONT-SIZE: 10pt">郭嘉丰</span>- <a href="file:."><span><span><span style="COLOR: #22148d">网页检索质量的提高</span></span></span></a></li>
<li><span style="FONT-SIZE: 10pt">2005-11-16</span> <span style="FONT-SIZE: 10pt">-</span> <span style="FONT-SIZE: 10pt">谭松波</span>- <a href="file:."><span style="COLOR: #22148d">CIKM</span> <span>大会总结</span></a></li>
<li><span style="FONT-SIZE: 10pt">2005-11-10</span> <span style="FONT-SIZE: 10pt">-</span> <span style="FONT-SIZE: 10pt">龚才春</span>- <a href="file:."><span><span><span style="COLOR: #22148d">挑战索引极限</span></span></span></a></li>
</ul>
</td>
</tr>
<tr>
<td style="BACKGROUND: green 0% 50%" valign="top">
<p><a href="http://www.searchforum.org.cn/seminar/lectures/index.html#index"><span><strong><span style="FONT-SIZE: 10pt; COLOR: yellow">2004</span></strong></span></a></p>
</td>
</tr>
<tr>
<td valign="top">
<ul>
<li><span style="FONT-SIZE: 10pt">20041025-</span> <span style="FONT-SIZE: 10pt">于满全</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20041025-TDT5_Report.ppt" target="_blank"><span style="COLOR: #22148d">TDT5_Report</span></a></li>
<li><span style="FONT-SIZE: 10pt">20041018-</span> <span style="FONT-SIZE: 10pt">张<br />
华平</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20041018-TREC2004%C3%82%C2%B1%C3%82%C2%A8%C3%82%C2%B8%C3%83%C2%A6-Novelty.ppt" target="_blank"><span style="COLOR: #22148d">TREC2004</span> <span>报告-Novelty</span></a></li>
<li><span style="FONT-SIZE: 10pt">20041011-</span> <span style="FONT-SIZE: 10pt">胡吉祥</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20041011-On%20Finding%20Repeats%20in%20Strings.ppt" target="_blank"><span style="COLOR: #22148d">On Finding Repeats in Strings</span></a></li>
<li><span style="FONT-SIZE: 10pt">20040920-</span> <span style="FONT-SIZE: 10pt">王小飞</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20040920-%C3%82%C2%BB%C3%83%C2%B9%C3%83%C2%93%C3%83%C2%9A%C3%83%C2%8B%C3%82%C2%AB%C3%83%C2%8A%C3%83%C2%BD%C3%83%C2%97%C3%83%C2%A9Trie%C3%83%C2%8A%C3%83%C2%B7%C3%82%C2%B5%C3%83%C2%84%C3%82%C2%B4%C3%83%C2%8A%C3%82%C2%B5%C3%83%C2%A4%C3%82%C2%B2%C3%83%C2%A9%C3%83%C2%91%C3%82%C2%AF.ppt" target="_blank"><span style="COLOR: #22148d"><span><span>基于双数组Trie</span></span> <span><span>树的词典查询</span></span></span></a></li>
<li><span style="FONT-SIZE: 10pt">20040906-</span> <span style="FONT-SIZE: 10pt">张丙奇</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20040906-%C3%82%C2%BB%C3%83%C2%B9%C3%83%C2%93%C3%83%C2%9A%C3%82%C2%BD%C3%83%C2%A1%C3%82%C2%B9%C3%82%C2%B9%C3%82%C2%BB%C3%82%C2%AF%C3%82%C2%B9%C3%83%C2%A6%C3%83%C2%94%C3%83%C2%B2%C3%82%C2%B5%C3%83%C2%84%C3%82%C2%B8%C3%83%C2%B6%C3%83%C2%90%C3%83%C2%94%C3%82%C2%BB%C3%82%C2%AF%C3%83%C2%8D%C3%83%C2%86%C3%82%C2%BC%C3%83%C2%B6%C3%83%C2%8B%C3%83%C2%A3%C3%82%C2%B7%C3%82%C2%A8.ppt" target="_blank"><span><span><span style="COLOR: #22148d">基于结构化规则的个性化推荐算法</span></span></span></a></li>
<li><span style="FONT-SIZE: 10pt">20040827-</span> <span style="FONT-SIZE: 10pt">赵章界</span>- <a href="file:." target="_blank"><span style="COLOR: #22148d"><span><span>短语结构制导的范畴表达式演算&#8211;</span></span> <span><span>发展与问题</span></span></span></a></li>
<li><span style="FONT-SIZE: 10pt">20040728-</span> <span style="FONT-SIZE: 10pt">于满泉</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20040728-%C3%83%C2%8D%C3%83%C2%B8%C3%83%C2%92%C3%82%C2%B3%C3%82%C2%B2%C3%83%C2%A9%C3%83%C2%96%C3%83%C2%98%C3%82%C2%BC%C3%82%C2%BC%C3%83%C2%8A%C3%83%C2%B5%C3%83%C2%97%C3%83%C2%9B%C3%83%C2%8A%C3%83%C2%B6.ppt" target="_blank"><span><span><span style="COLOR: #22148d">网页查重技术综述</span></span></span></a></li>
<li><span style="FONT-SIZE: 10pt">20040724-</span> <span style="FONT-SIZE: 10pt">张华平</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20040724-Noovel%20System%20Design.pdf" target="_blank"><span style="COLOR: #22148d">Noovel System Design</span></a></li>
<li><span style="FONT-SIZE: 10pt">20040720-</span> <span style="FONT-SIZE: 10pt">谭松波</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20040720-%C3%83%C2%97%C3%83%C2%AE%C3%82%C2%B6%C3%83%C2%8C%C3%83%C2%82%C3%82%C2%B7%C3%82%C2%BE%C3%82%C2%B6%C3%83%C2%8B%C3%83%C2%A3%C3%82%C2%B7%C3%82%C2%A8.ppt" target="_blank"><span><span><span style="COLOR: #22148d">最短路径算法</span></span></span></a></li>
<li><span style="FONT-SIZE: 10pt">20040617-</span> <span style="FONT-SIZE: 10pt">谭松波</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20040617-%C3%83%C2%8E%C3%83%C2%84%C3%82%C2%B1%C3%82%C2%BE%C3%83%C2%8C%C3%83%C2%98%C3%83%C2%95%C3%83%C2%B7%C3%83%C2%8C%C3%83%C2%A1%C3%83%C2%88%C3%82%C2%A1%C3%83%C2%8B%C3%83%C2%A3%C3%82%C2%B7%C3%82%C2%A8%C3%82%C2%B8%C3%83%C2%85%C3%83%C2%8A%C3%83%C2%B6.ppt" target="_blank"><span><span><span style="COLOR: #22148d">文</span>本特征提取算法概述</span></span></a></li>
<li><span style="FONT-SIZE: 10pt">20040607-</span> <span style="FONT-SIZE: 10pt">丁国栋</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20040607-Language%20Modeling%20for%20Information%20Retrieval.ppt" target="_blank"><span style="COLOR: #22148d">Language Modeling for Information Retrieval</span></a></li>
<li><span style="FONT-SIZE: 10pt">20040527-</span> <span style="FONT-SIZE: 10pt">赵章界</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20040527-%C3%83%C2%8E%C3%83%C2%84%C3%82%C2%B5%C3%82%C2%B5%C3%82%C2%B8%C3%83%C2%B1%C3%83%C2%8A%C3%82%C2%BD%C3%82%C2%B7%C3%83%C2%96%C3%83%C2%8E%C3%83%C2%B6.ppt" target="_blank"><span><span><span style="COLOR: #22148d">文档格式分析</span></span></span></a></li>
<li><span style="FONT-SIZE: 10pt">20040520-</span> <span style="FONT-SIZE: 10pt">吴丽辉</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20040520-%C3%82%C2%B8%C3%83%C2%B6%C3%83%C2%90%C3%83%C2%94%C3%82%C2%BB%C3%82%C2%AF%C3%82%C2%B7%C3%83%C2%BE%C3%83%C2%8E%C3%83%C2%B1%C3%82%C2%BC%C3%82%C2%BC%C3%83%C2%8A%C3%83%C2%B5.zip" target="_blank"><span><span><span style="COLOR: #22148d">个性化服务技术</span></span></span></a></li>
<li><span style="FONT-SIZE: 10pt">20040520-</span> <span style="FONT-SIZE: 10pt">张华平</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20040520-%C3%83%C2%96%C3%83%C2%90%C3%83%C2%8E%C3%83%C2%84%C3%82%C2%B4%C3%83%C2%8A%C3%82%C2%B7%C3%82%C2%A8%C3%82%C2%B7%C3%83%C2%96%C3%83%C2%8E%C3%83%C2%B6%C3%83%C2%97%C3%82%C2%A8%C3%83%C2%8C%C3%83%C2%A2%C3%83%C2%8C%C3%83%C2%96%C3%83%C2%82%C3%83%C2%9B.ppt" target="_blank"><span><span><span style="COLOR: #22148d">中文词法分析专题讨论</span></span></span></a></li>
<li><span style="FONT-SIZE: 10pt">20040520-</span> <span style="FONT-SIZE: 10pt">邹刚</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20040520-%C3%82%C2%BB%C3%83%C2%B9%C3%83%C2%93%C3%83%C2%9AInternet%C3%82%C2%B5%C3%83%C2%84%C3%83%C2%90%C3%83%C2%82%C3%82%C2%B4%C3%83%C2%8A%C3%83%C2%93%C3%83%C2%AF%C3%83%C2%97%C3%83%C2%94%C3%82%C2%B6%C3%82%C2%AF%C3%82%C2%BC%C3%83%C2%AC%C3%82%C2%B2%C3%83%C2%A2.ppt" target="_blank"><span style="COLOR: #22148d"><span><span>基于Internet</span></span> <span><span>的新词语自动检测</span></span></span></a></li>
<li><span style="FONT-SIZE: 10pt">20040517-</span> <span style="FONT-SIZE: 10pt">王树西</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20040517-%C3%83%C2%8E%C3%83%C2%8A%C3%82%C2%B4%C3%83%C2%B0%C3%83%C2%8F%C3%82%C2%B5%C3%83%C2%8D%C3%82%C2%B3%C3%83%C2%97%C3%83%C2%9B%C3%83%C2%8A%C3%83%C2%B6%C3%83%C2%93%C3%83%C2%AB%C3%83%C2%86%C3%83%C2%80%C3%82%C2%B2%C3%83%C2%A2.ppt" target="_blank"><span><span><span style="COLOR: #22148d">问答系统综述与评测</span></span></span></a></li>
<li><span style="FONT-SIZE: 10pt">20040510-</span> <span style="FONT-SIZE: 10pt">张华平</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20040510-A%20Special%20Reading%20on%20Web%20as%20Corpus%20and%20English%20Writing%20Assistant.ppt" target="_blank"><span style="COLOR: #22148d">A Special Reading on Web as Corpus and English Writing Assistant</span></a></li>
<li><span style="FONT-SIZE: 10pt">20040426-</span> <span style="FONT-SIZE: 10pt">张刚</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20040426-Distributed%20Information%20Retrieval.zip" target="_blank"><span style="COLOR: #22148d">Distributed Information Retrieval</span></a></li>
<li><span style="FONT-SIZE: 10pt">20040422-</span> <span style="FONT-SIZE: 10pt">于满泉</span>-开题报告<span>demo</span></li>
<li><span style="FONT-SIZE: 10pt">20040421-</span> <span style="FONT-SIZE: 10pt">吕建明</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20040421-P2P%C3%83%C2%89%C3%83%C2%A8%C3%82%C2%BC%C3%83%C2%86%C3%83%C2%8B%C3%82%C2%BC%C3%83%C2%8F%C3%83%C2%AB%C3%82%C2%BC%C3%82%C2%B0%C3%83%C2%86%C3%83%C2%A4%C3%83%C2%94%C3%83%C2%9A%C3%82%C2%B4%C3%83%C2%A6%C3%82%C2%B4%C3%82%C2%A2%C3%82%C2%BA%C3%83%C2%8D%C3%82%C2%B9%C3%82%C2%B2%C3%83%C2%8F%C3%83%C2%AD%C3%83%C2%96%C3%83%C2%90%C3%82%C2%B5%C3%83%C2%84%C3%83%C2%93%C3%82%C2%A6%C3%83%C2%93%C3%83%C2%83.ppt" target="_blank"><span style="COLOR: #22148d">P2P</span> <span>设计思想及其在存储和共享中的应用</span></a></li>
<li><span style="FONT-SIZE: 10pt">20040419-</span> <span style="FONT-SIZE: 10pt">张丙奇</span>- <a href="http://www<br />
.searchforum.org.cn/seminar/lectures/20040419-XML%C3%83%C2%8D%C3%83%C2%9A%C3%82%C2%BE%C3%83%C2%B2%C3%82%C2%BD%C3%83%C2%A9%C3%83%C2%89%C3%83%C2%9C.ppt" target="_blank"><span style="COLOR: #22148d">XML</span> <span>挖掘介绍</span></a></li>
<li><span style="FONT-SIZE: 10pt">20040414-</span> <span style="FONT-SIZE: 10pt">张华平</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20040414-%C3%82%C2%BE%C3%82%C2%B2%C3%83%C2%8C%C3%82%C2%AC%C3%83%C2%8B%C3%83%C2%91%C3%83%C2%8B%C3%83%C2%B7%C3%82%C2%BD%C3%83%C2%A1%C3%82%C2%B9%C3%82%C2%B9%C3%83%C2%93%C3%83%C2%AB%C3%83%C2%8D%C3%83%C2%AA%C3%83%C2%83%C3%83%C2%80Hash%C3%82%C2%BA%C3%82%C2%AF%C3%83%C2%8A%C3%83%C2%BD.ppt" target="_blank"><span style="COLOR: #22148d"><span><span>静态搜索结构与完美Hash</span></span> <span><span>函数</span></span></span></a></li>
<li><span style="FONT-SIZE: 10pt">20040412-</span> <span style="FONT-SIZE: 10pt">丁国栋</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20040412-%C3%83%C2%8D%C3%82%C2%B3%C3%82%C2%BC%C3%83%C2%86%C3%83%C2%93%C3%83%C2%AF%C3%83%C2%91%C3%83%C2%94%C3%82%C2%BD%C3%82%C2%A8%C3%83%C2%84%C3%82%C2%A3%C3%83%C2%96%C3%83%C2%90%C3%82%C2%B5%C3%83%C2%84%C3%83%C2%86%C3%82%C2%BD%C3%82%C2%BB%C3%82%C2%AC%C3%82%C2%BC%C3%82%C2%BC%C3%83%C2%8A%C3%83%C2%B5.ppt" target="_blank"><span><span><span style="COLOR: #22148d">统计语言建模中的平滑技术</span></span></span></a></li>
<li><span style="FONT-SIZE: 10pt">20040408-</span> <span style="FONT-SIZE: 10pt">周昭涛</span>-开题报告</li>
<li><span style="FONT-SIZE: 10pt">20040318-</span> <span style="FONT-SIZE: 10pt">谭松波</span>- <a href="file:." target="_blank"><span><span><span style="COLOR: #22148d">神经网络的一般性介绍及局部搜索训练算法</span></span></span></a></li>
<li><span style="FONT-SIZE: 10pt">20040301-</span> <span style="FONT-SIZE: 10pt">张华平</span>-开题报告<span>-Novelty</span></li>
</ul>
</td>
</tr>
<tr style="HEIGHT: 7.5pt">
<td style="BACKGROUND: green 0% 50%; HEIGHT: 7.5pt">
<p><a href="http://www.searchforum.org.cn/seminar/lectures/index.html#index"><span><strong><span style="FONT-SIZE: 10pt; COLOR: yellow">2003</span></strong></span></a></p>
</td>
</tr>
<tr>
<td valign="top">
<ul>
<li><span style="FONT-SIZE: 10pt">20031215-</span> <span style="FONT-SIZE: 10pt">谭松波</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20031215-%C3%83%C2%87%C3%83%C2%B3%C3%82%C2%BD%C3%83%C2%A2NP%C3%83%C2%8E%C3%83%C2%8A%C3%83%C2%8C%C3%83%C2%A2%C3%82%C2%B5%C3%83%C2%84%C3%82%C2%BC%C3%82%C2%B8%C3%83%C2%96%C3%83%C2%96%C3%83%C2%93%C3%83%C2%90%C3%83%C2%90%C3%82%C2%A7%C3%82%C2%B5%C3%83%C2%84%C3%82%C2%BF%C3%83%C2%AC%C3%83%C2%8B%C3%83%C2%99%C3%83%C2%8B%C3%83%C2%A3%C3%82%C2%B7%C3%82%C2%A8.ppt" target="_blank"><span style="COLOR: #22148d"><span><span>求解NP</span></span> <span><span>问题的几种有效的快速算法</span></span></span></a></li>
<li><span style="FONT-SIZE: 10pt">20031208-</span> <span style="FONT-SIZE: 10pt">谢丰</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20031208-%C3%83%C2%8D%C3%83%C2%B8%C3%83%C2%82%C3%83%C2%A7%C3%82%C2%B0%C3%82%C2%B2%C3%83%C2%88%C3%82%C2%AB%C3%82%C2%BC%C3%82%C2%BC%C3%83%C2%8A%C3%83%C2%B5%C3%82%C2%B8%C3%83%C2%85%C3%83%C2%8A%C3%83%C2%B6.ppt" target="_blank"><span><span><span style="COLOR: #22148d">网络安全技术概述</span></span></span></a></li>
<li><span style="FONT-SIZE: 10pt">20031202-</span> <span style="FONT-SIZE: 10pt">于满全</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20031202-%C3%83%C2%90%C3%83%C2%85%C3%83%C2%8F%C3%82%C2%A2%C3%82%C2%B3%C3%83%C2%A9%C3%83%C2%88%C3%82%C2%A1%C3%82%C2%BD%C3%83%C2%A9%C3%83%C2%89%C3%83%C2%9C%C3%83%C2%93%C3%83%C2%AB%C3%83%C2%88%C3%83%C2%8B%C3%83%C2%8E%C3%83%C2%AF%C3%83%C2%97%C3%82%C2%B7%C3%83%C2%97%C3%83%C2%99%C3%82%C2%B3%C3%83%C2%B5%C3%83%C2%8C%C3%82%C2%BD.ppt" target="_blank"><span><span><span style="COLOR: #22148d">信息抽取介绍与人物追踪初探</span></span></span></a></li>
<li><span style="FONT-SIZE: 10pt">20031200-</span> <span style="FONT-SIZE: 10pt">周昭涛</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20031200-%C3%82%C2%B8%C3%83%C2%85%C3%83%C2%82%C3%83%C2%8A%C3%82%C2%BC%C3%83%C2%AC%C3%83%C2%8B%C3%83%C2%B7%C3%83%C2%97%C3%83%C2%9B%C3%83%C2%8A%C3%83%C2%B6.ppt" target="_blank"><span><span><span style="COLOR: #22148d">概率检索综述</span></span></span></a></li>
<li><span style="FONT-SIZE: 10pt">20031123-</span> <span style="FONT-SIZE: 10pt">丁凡</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20031123-%C3%82%C2%B7%C3%83%C2%96%C3%83%C2%90%C3%83%C2%8E%C3%82%C2%B7%C3%82%C2%BD%C3%82%C2%B7%C3%82%C2%A8%C3%83%C2%94%C3%83%C2%9A%C3%83%C2%8A%C3%83%C2%BD%C3%82%C2%BE%C3%83%C2%9D%C3%83%C2%81%C3%83%C2%B7%C3%82%C2%B4%C3%82%C2%A6%C3%83%C2%80%C3%83%C2%AD%C3%83%C2%96%C3%83%C2%90%C3%82%C2%B5%C3%83%C2%84%C3%83%C2%93%C3%82%C2%A6%C3%83%C2%93%C3%83%C2%83.ppt" target="_blank"><span><span><span style="COLOR: #22148d">分形方法在数据流</span>处理中的应用</span></span></a></li>
<li><span style="FONT-SIZE: 10pt">20031117-</span> <span style="FONT-SIZE: 10pt">常毅</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20031117-TREC2003_QA_report.ppt" target="_blank"><span style="COLOR: #22148d">TREC2003_QA_report</span></a></li>
<li><span style="FONT-SIZE: 10pt">20031114-</span> <span style="FONT-SIZE: 10pt">卜东波</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20031104-Bayesian%C3%83%C2%8D%C3%82%C2%B3%C3%82%C2%BC%C3%83%C2%86%C3%83%C2%8D%C3%83%C2%86%C3%82%C2%B6%C3%83%C2%8F%C3%82%C2%BC%C3%82%C2%BC%C3%83%C2%8A%C3%83%C2%B5.ppt" target="_blank"><span style="COLOR: #22148d">Bayesian</span> <span>统计推断技术</span></a></li>
<li><span style="FONT-SIZE: 10pt">20031114-</span> <span style="FONT-SIZE: 10pt">卜东波</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20031104-Our%20achievement%20on%20BioInformatics.ppt" target="_blank"><span style="COLOR: #22148d">Our achievement on BioInformatics</span></a></li>
<li><span style="FONT-SIZE: 10pt">20031027-</span> <span style="FONT-SIZE: 10pt">赵章界</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20031027-Parsing%C3%83%C2%91%C3%83%C2%9D%C3%83%C2%92%C3%83%C2%A5.ppt" target="_blank"><span style="COLOR: #22148d">Parsing</span> <span>演义</span></a></li>
<li><span style="FONT-SIZE: 10pt">20031020-</span> <span style="FONT-SIZE: 10pt">张华平</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20031020-Sentence%20Semantic%20Distance%20and%20Novelty%20Detection.ppt" target="_blank"><span style="COLOR: #22148d">Sentence Semantic Distance and Novelty Detection</span></a></li>
<li><span style="FONT-SIZE: 10pt">20031013-</span> <span style="FONT-SIZE: 10pt">张丙奇</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20031013-%C3%82%C2%B8%C3%83%C2%B6%C3%83%C2%90%C3%83%C2%94%C3%82%C2%BB%C3%82%C2%AF%C3%82%C2%B7%C3%83%C2%BE%C3%83%C2%8E%C3%83%C2%B1%C3%83%C2%91%C3%83%C2%90%C3%82%C2%BE%C3%82%C2%BF1011.ppt" target="_blank"><span><span><span style="COLOR: #22148d">个性化服务研究1011</span></span></span></a></li>
<li><span style="FONT-SIZE: 10pt">20030929-</span> <span style="FONT-SIZE: 10pt">杨哲</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20030929-%C3%83%C2%90%C3%83%C2%85%C3%83%C2%8F%C3%82%C2%A2%C3%82%C2%BC%C3%83%C2%AC%C3%83%C2%8B%C3%83%C2%B7%C3%83%C2%96%C3%83%C2%90%C3%82%C2%B5%C3%83%C2%84%C3%82%C2%B7%C3%82%C2%B4%C3%83%C2%80%C3%82%C2%A1.ppt" target="_blank"><span><span><span style="COLOR: #22148d">信息检索中的反馈</span></span></span></a></li>
<li><span style="FONT-SIZE: 10pt">20030929-</span> <span style="FONT-SIZE: 10pt">大规模内容处理组近年来工作总结</span></li>
<li><span style="FONT-SIZE: 10pt">20030928-</span> <span style="FONT-SIZE: 10pt">杨哲</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20030928-a%20survey%20of%20feedback%20in%20IR.doc" target="_blank"><span style="COLOR: #22148d">a survey of feedback in IR</span></a></li>
<li><span style="FONT-SIZE: 10pt">20030923-</span> <span style="FONT-SIZE: 10pt">计算所</span>973课题结题总结报告<span>ver0.5</span></li>
<li<br />
><span style="FONT-SIZE: 10pt">20030922- <a href="http://www.searchforum.org.cn/seminar/lectures/20030922-WEB%C3%83%C2%8D%C3%82%C2%BC%C3%82%C2%BD%C3%83%C2%A1%C3%82%C2%B9%C3%82%C2%B9.zip" target="_blank"><span style="COLOR: #22148d">WEB</span> <span>图结构</span></a></span></li>
<li><span style="FONT-SIZE: 10pt">20030917-</span> <span style="FONT-SIZE: 10pt">郭岩</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20030917-%C3%83%C2%92%C3%83%C2%B2%C3%83%C2%97%C3%83%C2%93%C3%82%C2%B7%C3%83%C2%96%C3%83%C2%8E%C3%83%C2%B6.rar" target="_blank"><span><span><span style="COLOR: #22148d">因子分析</span></span></span></a></li>
<li><span style="FONT-SIZE: 10pt">20030917-</span> <span style="FONT-SIZE: 10pt">协议处理分析中的问题</span></li>
<li><span style="FONT-SIZE: 10pt">20030917-</span> <span style="FONT-SIZE: 10pt">程序优化的一点体会</span></li>
<li><span style="FONT-SIZE: 10pt">20030915-</span> <span style="FONT-SIZE: 10pt">软件室信息采集技术进展</span></li>
<li><span style="FONT-SIZE: 10pt">20030915-</span> <span style="FONT-SIZE: 10pt">王树西</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20030915-%C3%83%C2%84%C3%82%C2%A3%C3%83%C2%8A%C3%82%C2%BD%C3%82%C2%BA%C3%83%C2%8F%C3%83%C2%92%C3%82%C2%BB%C3%82%C2%B5%C3%83%C2%84%C3%82%C2%A1%C3%82%C2%B0%C3%83%C2%95%C3%82%C2%B6%C3%83%C2%8A%C3%83%C2%97%C3%82%C2%A1%C3%82%C2%B1%C3%83%C2%8B%C3%83%C2%A3%C3%82%C2%B7%C3%82%C2%A8%C3%82%C2%BC%C3%82%C2%B0%C3%83%C2%86%C3%83%C2%A4%C3%83%C2%93%C3%82%C2%A6%C3%83%C2%93%C3%83%C2%83%20.ppt" target="_blank"><span><span><span style="COLOR: #22148d">模式合一的&#8221;斩首&#8221;算法及其应用</span></span></span></a></li>
<li><span style="FONT-SIZE: 10pt">20030911-</span> <span style="FONT-SIZE: 10pt">谭建龙</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20030911-DSMS_%C3%83%C2%91%C3%83%C2%90%C3%82%C2%BE%C3%82%C2%BF%C3%83%C2%8E%C3%83%C2%8A%C3%83%C2%8C%C3%83%C2%A2%C3%82%C2%BA%C3%83%C2%8D%C3%83%C2%93%C3%82%C2%A6%C3%83%C2%93%C3%83%C2%83%C3%83%C2%87%C3%82%C2%B0%C3%82%C2%BE%C3%82%C2%B0.ppt" target="_blank"><span style="COLOR: #22148d">DSMS_</span> <span>研究问题和应用前景</span></a></li>
<li><span style="FONT-SIZE: 10pt">20030909-</span> <span style="FONT-SIZE: 10pt">大规模内容计算之平台计划</span></li>
<li><span style="FONT-SIZE: 10pt">20030909-</span> <span style="FONT-SIZE: 10pt">采集器的隐藏策略</span></li>
<li><span style="FONT-SIZE: 10pt">20030908-</span> <span style="FONT-SIZE: 10pt">潘文锋</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20030908-%C3%83%C2%97%C3%83%C2%94%C3%83%C2%97%C3%83%C2%A9%C3%83%C2%96%C3%82%C2%AF%C3%83%C2%93%C3%82%C2%B3%C3%83%C2%89%C3%83%C2%A4(SOM)%C3%83%C2%89%C3%83%C2%B1%C3%82%C2%BE%C3%82%C2%AD%C3%83%C2%8D%C3%83%C2%B8%C3%83%C2%82%C3%83%C2%A7%C3%82%C2%BC%C3%82%C2%B0%C3%83%C2%86%C3%83%C2%A4%C3%83%C2%93%C3%82%C2%A6%C3%83%C2%93%C3%83%C2%83.ppt" target="_blank"><span style="COLOR: #22148d"><span><span>自组织映射(SOM)</span></span> <span><span>神经网络及其应用</span></span></span></a></li>
<li><span style="FONT-SIZE: 10pt">20030901-</span> <span style="FONT-SIZE: 10pt">于满全</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20030901-%C3%83%C2%8D%C3%83%C2%B8%C3%83%C2%92%C3%82%C2%B3%C3%83%C2%84%C3%83%C2%9A%C3%82%C2%B2%C3%82%C2%BF%C3%82%C2%BD%C3%83%C2%A1%C3%82%C2%B9%C3%82%C2%B9%C3%83%C2%90%C3%83%C2%85%C3%83%C2%8F%C3%82%C2%A2%C3%82%C2%B5%C3%83%C2%84%C3%83%C2%8D%C3%83%C2%9A%C3%82%C2%BE%C3%83%C2%B2%C3%82%C2%BC%C3%82%C2%B0%C3%83%C2%86%C3%83%C2%A4%C3%83%C2%93%C3%82%C2%A6%C3%83%C2%93%C3%83%C2%83.ppt" target="_blank"><span><span><span style="COLOR: #22148d">网页内部结构信息的挖掘及其应用</span></span></span></a></li>
<li><span style="FONT-SIZE: 10pt">20030722-</span> <span style="FONT-SIZE: 10pt">张华平</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20030722-ACL2003-%C3%83%C2%97%C3%83%C2%9B%C3%83%C2%8A%C3%83%C2%B6.ppt" target="_blank"><span style="COLOR: #22148d">ACL2003-</span> <span>综述</span></a></li>
<li><span style="FONT-SIZE: 10pt">20030416-</span> <span style="FONT-SIZE: 10pt">张凯</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20030416-Feature%20Selection%20Using%20Wordnet.ppt" target="_blank"><span style="COLOR: #22148d">Feature Selection Using Wordnet</span></a></li>
<li><span style="FONT-SIZE: 10pt">20030410-R.M.Wagner- <a href="http://www.searchforum.org.cn/seminar/lectures/20030410-a%20composition%20approach%20for%20services%20chaining.ppt" target="_blank"><span style="COLOR: #22148d">a composition approach for services chaining</span></a></span></li>
<li><span style="FONT-SIZE: 10pt">20030400-</span> <span style="FONT-SIZE: 10pt">常毅</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20030400-%C3%83%C2%8E%C3%83%C2%84%C3%82%C2%B1%C3%82%C2%BE%C3%82%C2%B1%C3%83%C2%AD%C3%83%C2%8A%C3%82%C2%BE%C3%83%C2%97%C3%83%C2%9B%C3%83%C2%8A%C3%83%C2%B6%C3%82%C2%BC%C3%82%C2%B0%C3%83%C2%86%C3%83%C2%A4%C3%82%C2%B8%C3%83%C2%84%C3%82%C2%BD%C3%83%C2%B8.ppt" target="_blank"><span><span><span style="COLOR: #22148d">文本表示综述及其改进</span></span></span></a></li>
<li><span style="FONT-SIZE: 10pt">20030218-</span> <span style="FONT-SIZE: 10pt">骆卫华</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20030218-Technical%20Profiles%20of%20TDT%20and%20Its%20Analysis.ppt" target="_blank"><span style="COLOR: #22148d">Technical Profiles of TDT and Its Analysis</span></a></li>
<li><span style="FONT-SIZE: 10pt">20030113-A.Ratnaparkhi <a href="http://www.searchforum.org.cn/seminar/lectures/20030113-19970500-A%20Simple%20Introduction%20to%20Maximum%20Entropy%20Models%20for%20Natural%20Language%20Processing.ps" target="_blank"><span style="COLOR: #22148d">-A Simple Introduction to Maximum Entropy Models for Natural Language Processing</span></a></span></li>
</ul>
</td>
</tr>
<tr style="HEIGHT: 7.5pt">
<td style="BACKGROUND: green 0% 50%; HEIGHT: 7.5pt">
<p><a href="http://www.searchforum.org.cn/seminar/lectures/index.html#index"><span><strong><span style="FONT-SIZE: 10pt; COLOR: yellow">2002</span></strong></span></a></p>
</td>
</tr>
<tr>
<td valign="top">
<ul>
<li><span style="FONT-SIZE: 10pt">20021226-</span> <span style="FONT-SIZE: 10pt">骆卫华</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20021226-Maximum%20Entropy%20Model%20&amp;%20Its%20Application%20in%20NLP.ppt" target="_blank"><span style="COLOR: #22148d">Maximum Entropy Model &amp; Its Application in NLP</span></a></li>
<li><span style="FONT-SIZE: 10pt">20021218-</span> <span style="FONT-SIZE: 10pt">张浩</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20021218-Report%20on%20Semi-supervised%20Training%20for%20Statistical%20Parsing.ppt" target="_blank"><span style="COLOR: #22148d">Report on Semi-supervised Training for Statistical Parsing</span></a></li>
<li><span style="FONT-SIZE: 10pt">20021200-</span> <span style="FONT-SIZE: 10pt">谭建龙</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20021211-data%20stream%20manager%20system.ppt" target="_blank"><span style="COLOR: #22148d">Data Stream Management System</span></a></li>
<li><span>20021211-</span> <span style="FONT-SIZE: 10pt">谭建龙</span>- <span><a href="http://www.searchforum.org.cn/seminar/lectures/20021211-Architectures,Models%20and%20Issues%20in%20Data%20Stream%20Systems.ppt" target="_blank"><span style="COLOR: #22148d">Architectures,Models and Issues in Data Stream Systems</span></a></span></li>
<li><span style="FONT-SIZE: 10pt">20021206-</span> <span style="FONT-SIZE: 10pt">刘群</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20021206-%C3%83%C2%8D%C3%82%C2%B3%C3%82%C2%BC%C3%83%C2%86%C3%82%C2%BB%C3%83%C2%BA%C3%83%C2%86%C3%83%C2%B7%C3%82%C2%B7%C3%82%C2%AD%C3%83%C2%92%C3%83%C2%AB%C3%82%C2%BC%C3%83%C2%B2%C3%82%C2%BD%C3%83%C2%A9.ppt" target="_blank"><span><span><span style="COLOR: #22148d">统计机器翻译简介</span></span></span></a></li>
<li><span style="FONT-SIZE: 10pt">20021204-</span> <span style="FONT-SIZE: 10pt">王斌</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20021204-TREC%C3%82%C2%B8%C3%83%C2%85%C3%82<br />
%C2%BF%C3%83%C2%B6%C3%82%C2%BC%C3%82%C2%B0TREC-11%C3%82%C2%BC%C3%83%C2%B2%C3%82%C2%BD%C3%83%C2%A9.ppt" target="_blank"><span style="COLOR: #22148d">TREC</span> <span>概况及TREC-11</span> <span><span>简介</span></span></a></li>
<li><span style="FONT-SIZE: 10pt">20021129-LiHang- <a href="http://www.searchforum.org.cn/seminar/lectures/20021129-Statistical%20Learning%20Methods%20in%20Natural%20Language%20Processing.ppt" target="_blank"><span style="COLOR: #22148d">Statistical Learning Methods in Natural Language Processing</span></a></span></li>
<li><span style="FONT-SIZE: 10pt">20021121-</span> <span style="FONT-SIZE: 10pt">谭建龙</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20021121-AC%C3%83%C2%8B%C3%83%C2%A3%C3%82%C2%B7%C3%82%C2%A8.ppt" target="_blank"><span style="COLOR: #22148d">AC</span> <span>算法</span></a></li>
<li><span style="FONT-SIZE: 10pt">20021121-</span> <span style="FONT-SIZE: 10pt">谭建龙</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20021121-ACBM%C3%83%C2%8B%C3%83%C2%A3%C3%82%C2%B7%C3%82%C2%A8.ppt" target="_blank"><span style="COLOR: #22148d">ACBM</span> <span>算法</span></a></li>
<li><span style="FONT-SIZE: 10pt">20021121-</span> <span style="FONT-SIZE: 10pt">谭建龙</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20021121-multi%20keyword%20matching.ppt" target="_blank"><span style="COLOR: #22148d">multi keyword matching</span></a></li>
<li><span style="FONT-SIZE: 10pt">20021121-</span> <span style="FONT-SIZE: 10pt">谭建龙</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20021121-multi%20keyword%20matching2.ppt" target="_blank"><span style="COLOR: #22148d">multi keyword matching2</span></a></li>
<li><span style="FONT-SIZE: 10pt">20021121-</span> <span style="FONT-SIZE: 10pt">谭建龙</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20021121-suffix_tree.ppt" target="_blank"><span style="COLOR: #22148d">suffix_tree</span></a></li>
<li><span style="FONT-SIZE: 10pt">20021113-******-Text Categorization( <a href="http://www.searchforum.org.cn/seminar/lectures/20021123--19990220-Text%20Classification%20From%20Labeled%20and%20Unlabeled%20Documents%20Using%20EM.pdf" target="_blank"><span style="COLOR: #22148d">1</span></a> , <a href="http://www.searchforum.org.cn/seminar/lectures/20021123-20010500-Using%20Unlabeled%20Data%20to%20Improve%20Text%20Classification.pdf" target="_blank"><span style="COLOR: #22148d">2</span></a> )</span></li>
<li><span style="FONT-SIZE: 10pt">20021000-</span> <span style="FONT-SIZE: 10pt">姜吉发</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20021000-Information%20extraction%20from%20text-part%20architecture.ppt" target="_blank"><span style="COLOR: #22148d">Information extraction from text-part architecture</span></a></li>
<li><span style="FONT-SIZE: 10pt">20020821-M.Steedman- <a href="http://www.searchforum.org.cn/seminar/lectures/20020821-Semi-Supervised%20Training%20for%20Statistical%20Parsing.pdf" target="_blank"><span style="COLOR: #22148d">Semi-Supervised Training for Statistical Parsing</span></a></span></li>
<li><span style="FONT-SIZE: 10pt">20020723-</span> <span style="FONT-SIZE: 10pt">曹存根</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20020723-%C3%83%C2%96%C3%82%C2%AA%C3%83%C2%8A%C3%82%C2%B6%C3%82%C2%BC%C3%82%C2%BC%C3%83%C2%8A%C3%83%C2%B5(KT)%C3%82%C2%A1%C3%82%C2%A2%C3%83%C2%96%C3%83%C2%87%C3%83%C2%84%C3%83%C2%9C%C3%82%C2%B1%C3%82%C2%BE%C3%83%C2%96%C3%83%C2%8A.ppt" target="_blank"><span style="COLOR: #22148d"><span><span>知识技术(KT)</span></span> <span><span>、智能本质</span></span></span></a></li>
<li><span style="FONT-SIZE: 10pt">20020600-</span> <span style="FONT-SIZE: 10pt">朱茂盛</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20020600-XML%20%C3%82%C2%BD%C3%83%C2%A9%C3%83%C2%89%C3%83%C2%9C.ppt" target="_blank"><span style="COLOR: #22148d">XML</span> <span>介绍</span></a></li>
<li><span style="FONT-SIZE: 10pt">20020400-WenJiRong- <a href="http://www.searchforum.org.cn/seminar/lectures/20020400-Mining%20for%20enhanced%20web%20search.ppt" target="_blank"><span style="COLOR: #22148d">Mining for enhanced web search.ppt</span></a></span></li>
<li><span style="FONT-SIZE: 10pt">20020325-</span> <span style="FONT-SIZE: 10pt">曹存根</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20020325-Resource%20Description%20Framework%20and%20Ontology.ppt" target="_blank"><span style="COLOR: #22148d">Resource Description Framework and Ontology</span></a></li>
<li><span style="FONT-SIZE: 10pt">20021200-</span> <span style="FONT-SIZE: 10pt">白硕</span>- <a href="file:." target="_blank"><span><span><span style="COLOR: #22148d">信息安全和证券市场中的机器学习与文本挖掘问题</span></span></span></a></li>
</ul>
</td>
</tr>
<tr style="HEIGHT: 7.5pt">
<td style="BACKGROUND: green 0% 50%; HEIGHT: 7.5pt">
<p><a href="http://www.searchforum.org.cn/seminar/lectures/index.html#index"><span><strong><span style="FONT-SIZE: 10pt; COLOR: yellow">2001</span></strong></span></a></p>
</td>
</tr>
<tr>
<td valign="top">
<ul>
<li><span style="FONT-SIZE: 10pt">20011219-LiHang- <a href="http://www.searchforum.org.cn/seminar/lectures/20011219-Some%20Notes%20on%20English%20Technical%20Writing.ppt" target="_blank"><span style="COLOR: #22148d">Some Notes on English Technical Writing</span></a></span></li>
<li><span style="FONT-SIZE: 10pt">20011210-</span> <span style="FONT-SIZE: 10pt">王斌</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20011210-%C3%83%C2%90%C3%83%C2%85%C3%83%C2%8F%C3%82%C2%A2%C3%82%C2%B9%C3%83%C2%BD%C3%83%C2%82%C3%83%C2%8B(Information%20Filtering,IF)%C3%83%C2%97%C3%83%C2%9B%C3%83%C2%8A%C3%83%C2%B6.ppt" target="_blank"><span style="COLOR: #22148d"><span><span>信息过滤(Information Filtering,IF)</span></span> <span><span>综述</span></span></span></a></li>
<li><span style="FONT-SIZE: 10pt">20010200-</span> <span style="FONT-SIZE: 10pt">王斌</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20010200-TREC%C3%83%C2%96%C3%82%C2%AE%C3%83%C2%8E%C3%83%C2%84%C3%82%C2%B1%C3%82%C2%BE%C3%82%C2%B9%C3%83%C2%BD%C3%83%C2%82%C3%83%C2%8B%C3%82%C2%BC%C3%82%C2%BC%C3%83%C2%8A%C3%83%C2%B5.ppt" target="_blank"><span style="COLOR: #22148d">TREC</span> <span>之文本过滤技术</span></a></li>
<li><span style="FONT-SIZE: 10pt">20010000-</span> <span style="FONT-SIZE: 10pt">张凯</span>- <a href="http://www.searchforum.org.cn/seminar/lectures/20010000-Web%20Usage%20Mining.ppt" target="_blank"><span style="COLOR: #22148d">Web Usage Mining</span></a></li>
</ul>
</td>
</tr>
</tbody>
</table></div>
]]></content:encoded>
			<wfw:commentRss>http://blog.zye.me/2011/04/9313.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What&#8217;s Google doing in search? c10088bc</title>
		<link>http://blog.zye.me/2010/01/55471.html</link>
		<comments>http://blog.zye.me/2010/01/55471.html#comments</comments>
		<pubDate>Mon, 25 Jan 2010 18:09:20 +0000</pubDate>
		<dc:creator>yezheng</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[information Retrieval]]></category>

		<guid isPermaLink="false">http://blog.so8848.com/2010/01/55471.html</guid>
		<description><![CDATA[1. Interesting highlighting in search results or snippets. 2. synonym expansion &#8212; query expansion 3. Social Search in Google labs 4. Google Squared extract interesting facts from WEB page, and present them in meaningful way to you 5. real-time search]]></description>
			<content:encoded><![CDATA[<p>1.<a href="http://http://feedproxy.google.com/~r/blogspot/MKuf/~3/EA7Lp_AtT1E/understanding-web-to-make-search-more.html"> </a><a href="http://googleblog.blogspot.com/2010/01/this-week-in-search-12210.html"><strong>Interesting highlighting in search results or snippets</strong></a><strong>. </strong></p>
<p><strong>2. </strong><a href="http://googleblog.blogspot.com/2010/01/helping-computers-understand-language.html"><strong>synonym expansion</strong></a><strong> &#8212; query expansion</strong></p>
<p><strong>3. </strong><a href="Social Search"><strong>Social Search</strong></a><strong> in Google labs</strong></p>
<p><strong>4. Google Squared </strong></p>
<p><strong> extract interesting facts from WEB page, and present them in meaningful way to you</strong></p>
<p><strong>5. </strong><strong><a href="http://googleblog.blogspot.com/2010/02/this-week-in-search-22110.html?utm_source=feedburner&amp;utm_medium=feed&amp;utm_campaign=Feed:+blogspot/MKuf+(Official+Google+Blog)">real-time search</a></strong></p>
<p><strong><br />
</strong></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.zye.me/2010/01/55471.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>content based image retrieval (CBIR) toolkits and package</title>
		<link>http://blog.zye.me/2009/05/52060.html</link>
		<comments>http://blog.zye.me/2009/05/52060.html#comments</comments>
		<pubDate>Sat, 23 May 2009 01:13:07 +0000</pubDate>
		<dc:creator>yezheng</dc:creator>
				<category><![CDATA[information Retrieval]]></category>
		<category><![CDATA[content based image retrieval]]></category>
		<category><![CDATA[image information retrieval]]></category>
		<category><![CDATA[Toolkits]]></category>

		<guid isPermaLink="false">http://blog.so8848.com/?p=52060</guid>
		<description><![CDATA[present several tools for CBIR. Unfortunately, these tools are all lack of documents. I choose LIRE since I am familiar with lucene and Java. Do you have any other good choices? If you do have, please comment. Thanks. The LIRE (Lucene Image REtrieval) library (Java based) It is a CBIR system based on Lucene (Java-based) <a href='http://blog.zye.me/2009/05/52060.html'>[...]</a>]]></description>
			<content:encoded><![CDATA[<p>present several tools for CBIR. Unfortunately, these tools are all lack of documents. I choose LIRE since I am familiar with lucene and Java.</p>
<p>Do you have any other good choices? If you do have, please comment. Thanks.</p>
<h1>The <a href="http://www.semanticmetadata.net/lire/" target="_blank">LIRE </a>(Lucene Image REtrieval) library (Java based)</h1>
<p>It is a CBIR system based on Lucene <strong>(Java-based)</strong> and <a href="http://sourceforge.net/projects/caliph-emir/">Caliph and Emir.</a> Caliph-emir is a <strong>Java</strong> &amp; MPEG-7 based tools for annotation and retrieval of digital photos and images, supporting semantic annotation and content based, metadata based and semantic image retrieval.  Lucene is a full text retrieval package. It is easy to imagine that what jobs LIRE  can do according to its cores.</p>
<p>Also, it&#8217;s easy to incorporate context-based image retrieval into LIRE as Lucene  does the job naturally. But Lucene, I think,  is not suitable for academic research.  However,  at lease you can use Caliph-emir to extract the low level features of images, which could also be used in other IR systems.</p>
<p>LIRE can used out-of-the-box.</p>
<h1>The GNU Image-Finding Tool(<a href="http://www.gnu.org/software/gift/" target="_blank">GIFT</a>, C based)</h1>
<p>The GIFT (the GNU Image-Finding Tool) is a Content Based Image Retrieval     System (CBIRS: <a href="http://en.wikipedia.org/wiki/CBIR" target="_blank">http://en.wikipedia.org/wiki/CBIR</a>). It enables you to do Query     By Example (QBE: <a href="http://en.wikipedia.org/wiki/QBE" target="_blank">http://en.wikipedia.org/wiki/QBE</a>) on images, giving you the     opportunity to improve query results by relevance feedback. For processing     your queries the program relies entirely on the content of the images,     freeing you from the need to annotate all images before querying the     collection.</p>
<p>The GIFT comes with a tool which lets you index whole directory trees     containing images in one go. You then can use the GIFT server and its     <a href="http://www.gnu.org/software/gift/#clients">clients</a>, to browse your own image collections</p>
<h1><a href="http://www-i6.informatik.rwth-aachen.de/~deselaers/fire/" target="_self">FIRE</a> (CPP)</h1>
<p>FIRE is an image retrieval system developed as part of the diploma thesis of Thomas Deselaers. Later, large parts were rewritten to make it more easily maintainable.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.zye.me/2009/05/52060.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Global Ranking</title>
		<link>http://blog.zye.me/2009/05/51966.html</link>
		<comments>http://blog.zye.me/2009/05/51966.html#comments</comments>
		<pubDate>Sun, 17 May 2009 23:20:58 +0000</pubDate>
		<dc:creator>yezheng</dc:creator>
				<category><![CDATA[information Retrieval]]></category>
		<category><![CDATA[learning to rank]]></category>

		<guid isPermaLink="false">http://blog.so8848.com/2009/05/51966.html</guid>
		<description><![CDATA[Backup Links &#160; &#160; Sent to you by Jeffye via Google Reader: &#160; &#160; Global Ranking via Research on Search by Dell Zhang on 5/17/09 Global Ranking looks a promising direction in the research area of Learning to Rank for Information Retrieval. [1] Global Ranking Using Continuous Conditional Random Fields[2] Global Ranking by Exploiting User <a href='http://blog.zye.me/2009/05/51966.html'>[...]</a>]]></description>
			<content:encoded><![CDATA[<p>Backup Links</p>
<div style="margin: 0px 2px; padding-top: 1px;    background-color: #c3d9ff; font-size: 1px !important;    line-height: 0px !important;">&nbsp;</div>
<div style="margin: 0px 1px; padding-top: 1px;    background-color: #c3d9ff; font-size: 1px !important;    line-height: 0px !important;">&nbsp;</div>
<div style="padding: 4px; background-color: #c3d9ff;">
<h3>Sent to you by Jeffye via Google Reader:</h3>
</div>
<div style="margin: 0px 1px; padding-top: 1px;    background-color: #c3d9ff; font-size: 1px !important;    line-height: 0px !important;">&nbsp;</div>
<div style="margin: 0px 2px; padding-top: 1px;    background-color: #c3d9ff; font-size: 1px !important;    line-height: 0px !important;">&nbsp;</div>
<div style="font-family:sans-serif;overflow:auto;width:100%;margin: 0px 10px">
<h2>
<div class=""><a href="http://feedproxy.google.com/~r/researchonsearch/~3/bjg53MXPtNo/global-ranking.html">Global Ranking</a></div>
</h2>
<div style="margin-bottom: 0.5em">via <a href="http://researchonsearch.blogspot.com/" class="f">Research on Search</a> by Dell Zhang on 5/17/09</div>
<p>
Global Ranking looks a promising direction in the research area of <a href="http://learningtorank.spaces.live.com/">Learning to Rank for Information Retrieval</a>.</p>
<p>[1] <a href="http://research.microsoft.com/en-us/people/taoqin/qin-nips08.pdf">Global Ranking Using Continuous Conditional Random Fields</a><br />[2] <a href="http://apex.sjtu.edu.cn/apex_wiki/2009_apexlab_papers">Global Ranking by Exploiting User Clicks</a>
<div><img width="1" height="1" src="http://res1.blogblog.com/tracker/9749960-8825119469044521645?l=researchonsearch.blogspot.com"/></div>
<p><img src="http://feeds2.feedburner.com/~r/researchonsearch/~4/bjg53MXPtNo" height="1" width="1"/></div>
<p></p>
<div style="margin: 0px 2px; padding-top: 1px;    background-color: #c3d9ff; font-size: 1px !important;    line-height: 0px !important;">&nbsp;</div>
<div style="margin: 0px 1px; padding-top: 1px;    background-color: #c3d9ff; font-size: 1px !important;    line-height: 0px !important;">&nbsp;</div>
<div style="padding: 4px; background-color: #c3d9ff;">
<h3>Things you can do from here:</h3>
<ul style="font-family:sans-serif">
<li><a href="http://www.google.com/reader/view/feed%2Fhttp%3A%2F%2Fresearchonsearch.blogspot.com%2Ffeeds%2Fposts%2Fdefault?source=email">Subscribe to Research on Search</a> using <b>Google Reader</b></li>
<li><a href="http://www.google.com/reader/?source=email">Get started using Google Reader</a> to easily keep up with <b>all your favorite sites</b></li>
</ul>
</div>
<div style="margin: 0px 1px; padding-top: 1px;    background-color: #c3d9ff; font-size: 1px !important;    line-height: 0px !important;">&nbsp;</div>
<div style="margin: 0px 2px; padding-top: 1px;    background-color: #c3d9ff; font-size: 1px !important;    line-height: 0px !important;">&nbsp;</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.zye.me/2009/05/51966.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>信息检索领域主要期刊和会议</title>
		<link>http://blog.zye.me/2009/04/51385.html</link>
		<comments>http://blog.zye.me/2009/04/51385.html#comments</comments>
		<pubDate>Mon, 27 Apr 2009 16:14:07 +0000</pubDate>
		<dc:creator>yezheng</dc:creator>
				<category><![CDATA[information Retrieval]]></category>

		<guid isPermaLink="false">http://blog.so8848.com/?p=51385</guid>
		<description><![CDATA[Journals TOIS &#8211; ACM Transactions on Information Systems Publication: 467  Citation: 13992 IPM &#8211; Information Processing and Management Publication: 2142  Citation: 9622 JASIS &#8211; Journal of the American Society for Information Science and Technology Publication: 2559  Citation: 10127 SIGIR Forum Publication: 900  Citation: 3378 IR &#8211; Information Retrieval Publication: 238  Citation: 1400 Conferences SIGIR &#8211; Research and Development in Information <a href='http://blog.zye.me/2009/04/51385.html'>[...]</a>]]></description>
			<content:encoded><![CDATA[<p><span style="font-size: 14pt; font-weight: bold;">Journals</span></p>
<p><a href="http://libra.msra.cn/JournalDetail.aspx?id=51"><strong>TOIS</strong> &#8211; ACM Transactions on Information Systems</a> Publication: 467  Citation: 13992<br />
<a href="http://libra.msra.cn/JournalDetail.aspx?id=45"><strong>IPM</strong> &#8211; Information Processing and Management</a> Publication: 2142  Citation: 9622</p>
<p><a href="http://libra.msra.cn/JournalDetail.aspx?id=141"><strong>JASIS</strong> &#8211; Journal of the American Society for Information Science and Technology</a> Publication: 2559  Citation: 10127<br />
<a href="http://libra.msra.cn/JournalDetail.aspx?id=250">SIGIR Forum</a> Publication: 900  Citation: 3378<br />
<a href="http://libra.msra.cn/JournalDetail.aspx?id=47"><strong>IR</strong> &#8211; Information Retrieval</a> Publication: 238  Citation: 1400</p>
<p><span style="font-size: 14pt; font-weight: bold;">Conferences</span></p>
<p><a href="http://libra.msra.cn/ConferenceDetail.aspx?id=368"><strong>SIGIR</strong> &#8211; Research and Development in Information Retrieval</a> Publication: 2304  Citation: 28040<br />
<a href="http://libra.msra.cn/ConferenceDetail.aspx?id=422"><strong>TREC</strong> &#8211; Text REtrieval Conference</a> Publication: 743  Citation: 5040</p>
<p><a href="http://libra.msra.cn/ConferenceDetail.aspx?id=572"><strong>CIKM</strong> &#8211; International Conference on Information and Knowledge Management</a> Publication: 1735  Citation: 8611<br />
<a href="http://libra.msra.cn/ConferenceDetail.aspx?id=635"><strong>DL</strong> &#8211; Digital Libraries</a> Publication: 895  Citation: 4807<br />
<a href="http://libra.msra.cn/ConferenceDetail.aspx?id=658"><strong>ECDL</strong> &#8211; European Conference on Digital Libraries</a> Publication: 687  Citation: 1747<br />
<a href="http://libra.msra.cn/ConferenceDetail.aspx?id=384"><strong>SPIRE</strong> &#8211; String Processing and Information Retrieval</a> Publication: 353  Citation: 755<br />
<a href="http://libra.msra.cn/ConferenceDetail.aspx?id=909"><strong>ECIR</strong> &#8211; European Colloquium on IR Research</a> Publication: 439  Citation: 746</p>
<p><a href="http://libra.msra.cn/ConferenceDetail.aspx?id=1357">Multimedia Information Retrieval</a> Publication: 140  Citation: 42<br />
<a href="http://libra.msra.cn/ConferenceDetail.aspx?id=2107"><strong>INEX</strong> &#8211; INitiative for the Evaluation of XML Retrieval</a> Publication: 184  Citation: 205<br />
<a href="http://libra.msra.cn/ConferenceDetail.aspx?id=1773"><strong>AIRS</strong> &#8211; Asia Information Retrieval Symposium</a> Publication: 246  Citation: 47<br />
<a href="http://libra.msra.cn/ConferenceDetail.aspx?id=578"><strong>CLEF</strong> &#8211; Cross-Language Evaluation Forum</a> Publication: 622  Citation: 489<br />
<a href="http://libra.msra.cn/ConferenceDetail.aspx?id=1786"><strong>AMR</strong> &#8211; Adaptive Multimedia Retrieval</a> Publication: 76  Citation: 18<br />
<a href="http://libra.msra.cn/ConferenceDetail.aspx?id=115"><strong>JCDL</strong> &#8211; ACM/IEEE Joint Conference on Digital Libraries</a> Publication: 373  Citation: 237<br />
<a href="http://libra.msra.cn/ConferenceDetail.aspx?id=1069"><strong>ICADL</strong> &#8211; International Conference on Asian Digital Libraries</a> Publication: 555  Citation: 135<br />
<a href="http://libra.msra.cn/ConferenceDetail.aspx?id=1673"><strong>IRAL</strong> &#8211; International Workshop on Information Retrieval with Asia Languages</a> Publication: 64  Citation: 105<br />
<a href="http://libra.msra.cn/ConferenceDetail.aspx?id=2097"><strong>IIIX</strong> &#8211; Information Interaction in Context</a> Publication: 52  Citation: 7<br />
<a href="http://libra.msra.cn/ConferenceDetail.aspx?id=737"><strong>HIM</strong> &#8211; Hypertext, Information Retrieval, Multimedia</a> Publication: 123  Citation: 127<br />
<a href="http://libra.msra.cn/ConferenceDetail.aspx?id=1880"><strong>CORIA</strong> &#8211; Conference en Recherche d&#8217;Infomations et Applications</a> Publication: 46  Citation: 2<br />
<a href="http://libra.msra.cn/ConferenceDetail.aspx?id=853">Dublin Core Conference</a> Publication: 46  Citation: 65<br />
<a href="http://libra.msra.cn/ConferenceDetail.aspx?id=2129"><strong>ISIWI</strong> &#8211; Internationales Symposium fÃ¼r Informationswissenschaft</a> Publication: 136  Citation: 22<br />
<a href="http://libra.msra.cn/ConferenceDetail.aspx?id=71">Information Retrieval</a> Publication: 51  Citation: 98<br />
<a href="http://libra.msra.cn/ConferenceDetail.aspx?id=1759"><strong>ADCS</strong> &#8211; Australasian Document Computing Symposium &#8211; ADCS</a> Publication: 41  Citation: 33<br />
<a href="http://libra.msra.cn/ConferenceDetail.aspx?id=757">New Developments in Digital Libraries</a> Publication: 36  Citation: 62<br />
<a href="http://libra.msra.cn/ConferenceDetail.aspx?id=1877"><strong>COLIS</strong> &#8211; Conference on Conceptions of Library and Information Sciences</a> Publication: 56  Citation: 37<br />
<a href="http://libra.msra.cn/ConferenceDetail.aspx?id=104"><strong>IuK</strong> &#8211; Information and Communication of the Learned Societies in Germany</a> Publication: 47  Citation: 0<br />
<a href="http://libra.msra.cn/ConferenceDetail.aspx?id=2193"><strong>MIRA</strong> &#8211; Multimedia Information Retrieval Applications</a> Publication: 28  Citation: 75<br />
<a href="http://libra.msra.cn/ConferenceDetail.aspx?id=1404">Networked Information Retrieval</a> Publication: 8  Citation: 38<br />
<a href="http://libra.msra.cn/ConferenceDetail.aspx?id=944"><strong>ESSIR</strong> &#8211; European Summer School in Information Retrieval</a> Publication: 12  Citation: 18<br />
<a href="http://libra.msra.cn/ConferenceDetail.aspx?id=684">Essen Symposium</a> Publication: 247  Citation: 13<br />
<a href="http://libra.msra.cn/ConferenceDetail.aspx?id=929">ELSNET Summer School</a> Publication: 7  Citation: 0<br />
<a href="http://libra.msra.cn/ConferenceDetail.aspx?id=1576">Current Trends in SNePS &#8211; Semantic Network Processing System</a> Publication: 12  Citation: 4<br />
<a href="http://libra.msra.cn/ConferenceDetail.aspx?id=1646"><strong>CAW</strong> &#8211; Computer Architecture Workshop</a> Publication: 1  Citation: 0</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.zye.me/2009/04/51385.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

