lucene - Design optimal Solr Schema -
hello have problem design of schema in solr. have transcript of telephone conversation in format. parse @ individual fields. have schema:
<?xml version="1.0"?> <add> <doc> <field name="id">01.cn</field> <field name="t">0<br /> 1<br /> 2<br /> 2 <br /> 3 <br /> ....</field> <field name="st">0.00<br /> 1.54<br /> 1.54<br /> 1.54 <br /> 1.57 <br /> ....</field> <field name="et">1.54<br /> 1.54<br /> 1.57<br /> 1.57 <br /> 1.7 <br /> ....</field> <field name="w">_silence_<br /> <s><br /> hello<br /> hallo <br /> _delete_ <br /> ....</field> <field name="p">0.000000<br /> 1<br /> 1<br /> 2.06115e-009 <br /> 1 <br /> ....</field> <field name="c">0<br /> 0<br /> 0<br /> 0 <br /> 0 <br /> ....</field> </doc> </add> i displayed in html document, , hence used <br />.
this original document:
t=0 st=0.00 et=1.54 w=_silence_ p=0.000000 c=0 t=1 st=1.54 et=1.54 w=<s> p=1 c=0 t=2 st=1.54 et=1.57 w=hello p=1 c=0 t=2 st=1.54 et=1.57 w=hallo p=2.06115e-009 c=0 t=3 st=1.57 et=1.70 w=_delete_ p=1 c=0 t=3 st=1.57 et=1.70 w=no p=2.06115e-009 c=0 t=4 st=1.70 et=2.12 w=how p=1 c=0 t=5 st=2.12 et=2.18 w=are_ p=0.25 c=0 t=5 st=2.12 et=2.18 w=_delete_ p=0.25 c=0 .......................................... .......................................... id - filename t = segment st = start time et = end time w = word p = probability c = chanel i want search illustration word time 1.57 (w:hello) , (t:[0 1.57]). if have info in 1 field (t, st,et ...) doesn't work. find files hello farther time 1.57.
do have ideas how create it? lot help.
have separate core 1 document each (word, time) combination, , query core instead.
solr lucene full-text-search solarium
No comments:
Post a Comment