Lucene get the agreed terms in the request
What's the best way to find out which terms in a query match a given document that were returned as a hit in lucene?
I've tried a weird method that includes a highlight highlighting package in lucene contrib, as well as a method that looks for every word in a query in the top most documents ("docId: xy AND description: each_word_in_query").
Not getting satisfactory results? The "Hit" highlight does not communicate some words that correspond to a document other than the first. I'm not sure if the second approach is the best alternative.
a source to share
Searcher's explain method is a great way to see which part of the query matched and how it affects the overall score.
Example taken from the book Lucene In Action 2nd Edition:
public class Explainer {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.err.println("Usage: Explainer <index dir> <query>");
System.exit(1);
}
String indexDir = args[0];
String queryExpression = args[1];
Directory directory = FSDirectory.open(new File(indexDir));
QueryParser parser = new QueryParser(Version.LUCENE_CURRENT,
"contents", new SimpleAnalyzer());
Query query = parser.parse(queryExpression);
System.out.println("Query: " + queryExpression);
IndexSearcher searcher = new IndexSearcher(directory);
TopDocs topDocs = searcher.search(query, 10);
for (int i = 0; i < topDocs.totalHits; i++) {
ScoreDoc match = topDocs.scoreDocs[i];
Explanation explanation = searcher.explain(query, match.doc);
System.out.println("----------");
Document doc = searcher.doc(match.doc);
System.out.println(doc.get("title"));
System.out.println(explanation.toString());
}
}
}
This explains the rating of each document that matches the request.
a source to share