My Blog: java - Lucene: Payloads and Similarity Function --- Always same Payload value -

Tuesday, 15 March 2011

java - Lucene: Payloads and Similarity Function --- Always same Payload value -

overview

i want implement lucene indexer/searcher uses new payload feature allows add together meta info text. in specific case, add together weights (that can understood % probabilities, between 0 , 100) conceptual tags in order utilize them overwrite standard lucene tf-idf weighting. puzzled behaviour of , believe there wrong similarity class, overwrote, cannot figure out.

example

when run search query (e.g. "concept:red") find each payload first number passed through mypayloadsimilarity (in code example, 1.0) , not 1.0, 50.0 , 100.0. result, documents same payload , same score. however, info should feature image #1, payload of 100.0, followed image #2, followed image #3 , various scores. can't heard around.

here results of run:

query: concept:red ===>  docid: 0 payload: 1.0 ===>  docid: 1 payload: 1.0 ===>  docid: 2 payload: 1.0 number of results:3 -> docid: 3.jpg score: 0.2518424 -> docid: 2.jpg score: 0.2518424 -> docid: 1.jpg score: 0.2518424

what wrong? did misunderstand payloads?

code

enclosed share code self-contained illustration create easy possible run it, should consider option.

public class payloadshowcase {   public static void main(string s[]) {      payloadshowcase p = new payloadshowcase();      p.run();  }  public void run () {     // step 1: indexing     mypayloadindexer indexer = new mypayloadindexer();     indexer.index();     // step 2: searching     mypayloadsearcher searcher = new mypayloadsearcher();     searcher.search("red"); }  public class mypayloadanalyzer extends analyzer {      private payloadencoder encoder;     mypayloadanalyzer(payloadencoder encoder) {         this.encoder = encoder;     }      @override     protected tokenstreamcomponents createcomponents(string fieldname, reader reader) {         tokenizer source = new whitespacetokenizer(reader);         tokenstream filter = new lowercasefilter(source);         filter = new delimitedpayloadtokenfilter(filter, '|', encoder);          homecoming new tokenstreamcomponents(source, filter);     } }  public class mypayloadindexer {      public mypayloadindexer() {}      public void index() {          seek {             directory dir = fsdirectory.open(new file("d:/data/indices/sandbox"));             analyzer analyzer = new mypayloadanalyzer(new floatencoder());             indexwriterconfig iwconfig = new indexwriterconfig(version.lucene_4_10_1, analyzer);             iwconfig.setsimilarity(new mypayloadsimilarity());             iwconfig.setopenmode(indexwriterconfig.openmode.create);              // load mappings , classifiers             hashmap<string, string> mappings = this.loaddatamappings();             hashmap<string, hashmap> cmaps = this.loaddata();              indexwriter  author = new indexwriter(dir, iwconfig);             indexdocuments(writer, mappings, cmaps);             writer.close();          }  grab (ioexception e) {             system.out.println("exception while indexing: " + e.getmessage());         }     }      private void indexdocuments(indexwriter writer, hashmap<string, string> filemappings, hashmap<string, hashmap> concepts) throws ioexception {          set fileset = filemappings.keyset();         iterator<string> iterator = fileset.iterator();         while (iterator.hasnext()){             // unique file  info             string fileid = iterator.next();             string filepath = filemappings.get(fileid);             // create new, empty document             document doc = new document();             // path of indexed file             field pathfield = new stringfield("path", filepath, field.store.yes);             doc.add(pathfield);             // lookup concept probabilities fileid             iterator<string> conceptiterator = concepts.keyset().iterator();             while (conceptiterator.hasnext()){                 string conceptname = conceptiterator.next();                 hashmap conceptmap = concepts.get(conceptname);                 doc.add(new textfield("concept", ("" + conceptname + "|").trim() + (conceptmap.get(fileid) + "").trim(), field.store.yes));             }             writer.adddocument(doc);         }     }      public hashmap<string, string> loaddatamappings(){         hashmap<string, string> h = new hashmap<>();         h.put("1", "1.jpg");         h.put("2", "2.jpg");         h.put("3", "3.jpg");          homecoming h;     }      public hashmap<string, hashmap> loaddata(){         hashmap<string, hashmap> h = new hashmap<>();         hashmap<string, string>  greenish = new hashmap<>();         green.put("1", "50.0");         green.put("2", "1.0");         green.put("3", "100.0");         hashmap<string, string>  reddish = new hashmap<>();         red.put("1", "100.0");         red.put("2", "50.0");         red.put("3", "1.0");         hashmap<string, string>  bluish = new hashmap<>();         blue.put("1", "1.0");         blue.put("2", "50.0");         blue.put("3", "100.0");         h.put("green", green);         h.put("red", red);         h.put("blue", blue);          homecoming h;     } }  class mypayloadsimilarity extends defaultsimilarity {      @override     public float scorepayload(int docid, int start, int end, bytesref payload) {         float pload = 1.0f;         if (payload != null) {             pload = payloadhelper.decodefloat(payload.bytes);         }         system.out.println("===>  docid: " + docid + " payload: " + pload);          homecoming pload;     } }  public class mypayloadsearcher {      public mypayloadsearcher() {}      public void search(string querystring) {          seek {             indexreader reader = directoryreader.open(fsdirectory.open(new file("d:/data/indices/sandbox")));             indexsearcher searcher = new indexsearcher(reader);             searcher.setsimilarity(new payloadsimilarity());             payloadtermquery query = new payloadtermquery(new term("concept", querystring),                     new averagepayloadfunction());             system.out.println("query: " + query.tostring());             topdocs topdocs = searcher.search(query, 999);             scoredoc[] hits = topdocs.scoredocs;             system.out.println("number of results:" + hits.length);              // output             (int = 0; < hits.length; i++) {                 document doc = searcher.doc(hits[i].doc);                 system.out.println("-> docid: " + doc.get("path") + " score: " + hits[i].score);             }             reader.close();          }  grab (exception e) {             system.out.println("exception while searching: " + e.getmessage());         }     } }

}

at mypayloadsimilarity, payloadhelper.decodefloat phone call incorrect. in case, it's necessary pass payload.offset param, this:

pload = payloadhelper.decodefloat(payload.bytes, payload.offset);

i hope helps.

java lucene

My Blog

Tuesday, 15 March 2011

java - Lucene: Payloads and Similarity Function --- Always same Payload value -

No comments:

Post a Comment