The Million Queries Track ran for the first time in 2007.
Quoting from the track home page:
“The goal of this track is to run a retrieval task similar to standard ad-hoc retrieval,
but to evaluate large numbers ofqueries incompletely, rather than a small number more completely.Participants will run 10,000 queries and [...]]]>
http://www.mail-archive.com/general@lucene.apache.org/msg00432.html
由于刚读研的时候就开始学习Lucene,所以一直对Lucene情有独钟,现在想在排序ranking方面做些研究,感觉最好先能花点时间把一些基本ranking算法实现,譬如tfidf,bm25什么的。想在lucene框架的基础上完成这个工作,但研究一段感觉有困难,今天看到上面链接里的帖子也遇到这样的问题,不知有没有大侠有过相关研究,还劳烦给点提示!
其实最开始是想实现一个Lab-Lucene,用于做各种IR相关实验,但一开始就被这个问题给难住了,挺郁闷。以后有进展,定分享之。
————————————–
PS: 2010-02-28
NEWS: Lab-Lucene 现在主要已经基本开发完,现在已经包括绝大多少basic weighting model (LM, BM25, DFR, TFIDF) , 以及各种Query Expansion Models, 性能在一些列 TREC ad hoc datasets 上,至少与公开实验数据是comparable.
计划在写完基于hadoop 分布式所以和检索后开源,当然还有文档。欢迎讨论!
Incoming search terms for the article:lucene 排序算法 (13)lucene 排序 (3)lucene BM25檢索 (2)lucene 2 4 排序 (2)lm 排序算法源码 (2)Related PostsBM系列(如Okapi BM25)Weighting 公式介绍及文献– BM family weighting scheme Introduction and important LiteraturesuuIBM Haifa Team 把Lucene Ranking系统打造成state-of-the-art ,TREC 2007 [...]]]>简单类型不外是byte, char, short, int, long, float, double等数据类型, 这些类型不能放在聚集中,只能使用数组。java.util.Arrays方法提供了对这些类型的sort方法(实际上还有很多其他有用的方法),下面是对一个简单的int数组排序:
int[] arr = {2, 3, 1,10,7,4};
System.out.print(“before sort: “);
for (int i = 0; i< arr.length; i++)
http://www.5yiso.cn/articles/java-hashmap-%e5%a6%82%e4%bd%95%e7%ae%80%e5%8d%95%e5%9c%b0%e6%8e%92%e5%ba%8f%e5%90%8e%e8%be%93%e5%87%ba-%e5%a4%87%e5%bf%98.html
HashMap<Term, Float> termMap = new HashMap<Term, Float>(); List entries = new ArrayList(termMap.entrySet()); Comparator cmp = new Comparator() { public int compare(Object o1, Object o2) { Map.Entry e1 = (Map.Entry) o1; Map.Entry e2 = (Map.Entry) o2; Comparable v1 = (Comparable) e1.getValue(); Comparable v2 = (Comparable) e2.getValue(); return [...]]]>