Tuesday, 15 March 2011

java - Lucene: Payloads and Similarity Function --- Always same Payload value -



java - Lucene: Payloads and Similarity Function --- Always same Payload value -

overview

i want implement lucene indexer/searcher uses new payload feature allows add together meta info text. in specific case, add together weights (that can understood % probabilities, between 0 , 100) conceptual tags in order utilize them overwrite standard lucene tf-idf weighting. puzzled behaviour of , believe there wrong similarity class, overwrote, cannot figure out.

example

when run search query (e.g. "concept:red") find each payload first number passed through mypayloadsimilarity (in code example, 1.0) , not 1.0, 50.0 , 100.0. result, documents same payload , same score. however, info should feature image #1, payload of 100.0, followed image #2, followed image #3 , various scores. can't heard around.

here results of run:

query: concept:red ===> docid: 0 payload: 1.0 ===> docid: 1 payload: 1.0 ===> docid: 2 payload: 1.0 number of results:3 -> docid: 3.jpg score: 0.2518424 -> docid: 2.jpg score: 0.2518424 -> docid: 1.jpg score: 0.2518424

what wrong? did misunderstand payloads?

code

enclosed share code self-contained illustration create easy possible run it, should consider option.

public class payloadshowcase { public static void main(string s[]) { payloadshowcase p = new payloadshowcase(); p.run(); } public void run () { // step 1: indexing mypayloadindexer indexer = new mypayloadindexer(); indexer.index(); // step 2: searching mypayloadsearcher searcher = new mypayloadsearcher(); searcher.search("red"); } public class mypayloadanalyzer extends analyzer { private payloadencoder encoder; mypayloadanalyzer(payloadencoder encoder) { this.encoder = encoder; } @override protected tokenstreamcomponents createcomponents(string fieldname, reader reader) { tokenizer source = new whitespacetokenizer(reader); tokenstream filter = new lowercasefilter(source); filter = new delimitedpayloadtokenfilter(filter, '|', encoder); homecoming new tokenstreamcomponents(source, filter); } } public class mypayloadindexer { public mypayloadindexer() {} public void index() { seek { directory dir = fsdirectory.open(new file("d:/data/indices/sandbox")); analyzer analyzer = new mypayloadanalyzer(new floatencoder()); indexwriterconfig iwconfig = new indexwriterconfig(version.lucene_4_10_1, analyzer); iwconfig.setsimilarity(new mypayloadsimilarity()); iwconfig.setopenmode(indexwriterconfig.openmode.create); // load mappings , classifiers hashmap<string, string> mappings = this.loaddatamappings(); hashmap<string, hashmap> cmaps = this.loaddata(); indexwriter author = new indexwriter(dir, iwconfig); indexdocuments(writer, mappings, cmaps); writer.close(); } grab (ioexception e) { system.out.println("exception while indexing: " + e.getmessage()); } } private void indexdocuments(indexwriter writer, hashmap<string, string> filemappings, hashmap<string, hashmap> concepts) throws ioexception { set fileset = filemappings.keyset(); iterator<string> iterator = fileset.iterator(); while (iterator.hasnext()){ // unique file info string fileid = iterator.next(); string filepath = filemappings.get(fileid); // create new, empty document document doc = new document(); // path of indexed file field pathfield = new stringfield("path", filepath, field.store.yes); doc.add(pathfield); // lookup concept probabilities fileid iterator<string> conceptiterator = concepts.keyset().iterator(); while (conceptiterator.hasnext()){ string conceptname = conceptiterator.next(); hashmap conceptmap = concepts.get(conceptname); doc.add(new textfield("concept", ("" + conceptname + "|").trim() + (conceptmap.get(fileid) + "").trim(), field.store.yes)); } writer.adddocument(doc); } } public hashmap<string, string> loaddatamappings(){ hashmap<string, string> h = new hashmap<>(); h.put("1", "1.jpg"); h.put("2", "2.jpg"); h.put("3", "3.jpg"); homecoming h; } public hashmap<string, hashmap> loaddata(){ hashmap<string, hashmap> h = new hashmap<>(); hashmap<string, string> greenish = new hashmap<>(); green.put("1", "50.0"); green.put("2", "1.0"); green.put("3", "100.0"); hashmap<string, string> reddish = new hashmap<>(); red.put("1", "100.0"); red.put("2", "50.0"); red.put("3", "1.0"); hashmap<string, string> bluish = new hashmap<>(); blue.put("1", "1.0"); blue.put("2", "50.0"); blue.put("3", "100.0"); h.put("green", green); h.put("red", red); h.put("blue", blue); homecoming h; } } class mypayloadsimilarity extends defaultsimilarity { @override public float scorepayload(int docid, int start, int end, bytesref payload) { float pload = 1.0f; if (payload != null) { pload = payloadhelper.decodefloat(payload.bytes); } system.out.println("===> docid: " + docid + " payload: " + pload); homecoming pload; } } public class mypayloadsearcher { public mypayloadsearcher() {} public void search(string querystring) { seek { indexreader reader = directoryreader.open(fsdirectory.open(new file("d:/data/indices/sandbox"))); indexsearcher searcher = new indexsearcher(reader); searcher.setsimilarity(new payloadsimilarity()); payloadtermquery query = new payloadtermquery(new term("concept", querystring), new averagepayloadfunction()); system.out.println("query: " + query.tostring()); topdocs topdocs = searcher.search(query, 999); scoredoc[] hits = topdocs.scoredocs; system.out.println("number of results:" + hits.length); // output (int = 0; < hits.length; i++) { document doc = searcher.doc(hits[i].doc); system.out.println("-> docid: " + doc.get("path") + " score: " + hits[i].score); } reader.close(); } grab (exception e) { system.out.println("exception while searching: " + e.getmessage()); } } }

}

at mypayloadsimilarity, payloadhelper.decodefloat phone call incorrect. in case, it's necessary pass payload.offset param, this:

pload = payloadhelper.decodefloat(payload.bytes, payload.offset);

i hope helps.

java lucene

No comments:

Post a Comment