Information Retrieval Blog » 排序 http://blog.zye.me ANTI-GFW Sun, 29 Aug 2010 03:59:54 +0000 http://wordpress.org/?v=2.9.1 en hourly 1 IBM Haifa Team 把Lucene Ranking系统打造成state-of-the-art ,TREC 2007 Million Queries Track – IBM Haifa Team http://blog.zye.me/2009/02/49240.html http://blog.zye.me/2009/02/49240.html#comments Wed, 18 Feb 2009 16:54:05 +0000 jeffye http://blog.so8848.com/2009/02/49240.html TREC 2007 Million Queries Track – IBM Haifa Team

The  Million Queries Track ran for the first time in 2007.

Quoting from the track home page:

“The goal of this track is to run a retrieval task similar to standard ad-hoc retrieval,

but to evaluate large numbers ofqueries incompletely, rather than a small number more completely.Participants will run 10,000 queries and [...]]]> http://blog.zye.me/2009/02/49240.html/feed 0 关于修改/增加lucene排序算法的讨论 http://blog.zye.me/2008/07/36256.html http://blog.zye.me/2008/07/36256.html#comments Tue, 01 Jul 2008 14:53:55 +0000 jeffye http://www.5yiso.cn/2008/07/36256.html http://www.mail-archive.com/general@lucene.apache.org/msg00431.html

http://www.mail-archive.com/general@lucene.apache.org/msg00432.html

由于刚读研的时候就开始学习Lucene,所以一直对Lucene情有独钟,现在想在排序ranking方面做些研究,感觉最好先能花点时间把一些基本ranking算法实现,譬如tfidf,bm25什么的。想在lucene框架的基础上完成这个工作,但研究一段感觉有困难,今天看到上面链接里的帖子也遇到这样的问题,不知有没有大侠有过相关研究,还劳烦给点提示!

其实最开始是想实现一个Lab-Lucene,用于做各种IR相关实验,但一开始就被这个问题给难住了,挺郁闷。以后有进展,定分享之。

————————————–

PS:  2010-02-28

NEWS: Lab-Lucene 现在主要已经基本开发完,现在已经包括绝大多少basic weighting model (LM, BM25, DFR, TFIDF) , 以及各种Query Expansion Models, 性能在一些列 TREC ad hoc datasets 上,至少与公开实验数据是comparable.

计划在写完基于hadoop 分布式所以和检索后开源,当然还有文档。欢迎讨论!

Incoming search terms for the article:lucene 排序算法 (13)lucene 排序 (3)lucene BM25檢索 (2)lucene 2 4 排序 (2)lm 排序算法源码 (2)Related PostsBM系列(如Okapi BM25)Weighting 公式介绍及文献– BM family weighting scheme Introduction and important LiteraturesuuIBM Haifa Team 把Lucene Ranking系统打造成state-of-the-art ,TREC 2007 [...]]]>
http://blog.zye.me/2008/07/36256.html/feed 0
java 中排序方法(转)–备忘系列 http://blog.zye.me/2007/02/6006.html http://blog.zye.me/2007/02/6006.html#comments Thu, 22 Feb 2007 07:05:00 +0000 jeffye http://www.5yiso.cn/2007/02/6006.html 简单类型的排序

简单类型不外是byte, char, short, int, long, float, double等数据类型, 这些类型不能放在聚集中,只能使用数组。java.util.Arrays方法提供了对这些类型的sort方法(实际上还有很多其他有用的方法),下面是对一个简单的int数组排序:

int[] arr = {2, 3, 1,10,7,4};

System.out.print(“before sort: “);

for (int i = 0; i< arr.length; i++)

http://www.5yiso.cn/articles/java-hashmap-%e5%a6%82%e4%bd%95%e7%ae%80%e5%8d%95%e5%9c%b0%e6%8e%92%e5%ba%8f%e5%90%8e%e8%be%93%e5%87%ba-%e5%a4%87%e5%bf%98.html 工作中相信大家都会是HashMap记录一些映射,记录完后可能还需要对其排序后输出。下面是一简单例子,记录下以作备忘!

  HashMap<Term, Float> termMap = new HashMap<Term, Float>();  List entries = new ArrayList(termMap.entrySet()); Comparator cmp = new Comparator() { public int compare(Object o1, Object o2) { Map.Entry e1 = (Map.Entry) o1; Map.Entry e2 = (Map.Entry) o2; Comparable v1 = (Comparable) e1.getValue(); Comparable v2 = (Comparable) e2.getValue(); return [...]]]> http://blog.zye.me/2007/02/3960.html/feed 0