errors and strange behaviour when compiling perl script -
i've perl script , when compile perl2exe gives next message:
can't locate unicore/heavy.pl in @inc (@inc contains: perl2exe_storage c:\tools\hunalign\scripts\sentence_splitter c:\users\euride~1\appdata\local\temp/p2xtmp-5936) @ perl2exe_storage/utf8_heavy.pl line 176.
i have file in unicore module, don't know problem is. however, temporarily overcome inserting line:
#perl2exe_include "unicore/heavy.pl";
now compiles without error, exe works bit different original exe compiled maker of program. (it's parser segments sentences, , exe doesn't segment @ total stops). may due limitation of free version of perl2exe? ideas?
code (both original build , own)
#!/usr/bin/perl -w utilize encode::unicode; utilize utf8; #perl2exe_include "unicore/heavy.pl"; # based on preprocessor written philipp koehn binmode(stdin, ":utf8"); binmode(stdout, ":utf8"); binmode(stderr, ":utf8"); utilize findbin qw($bin); utilize strict; $mydir = "$bin/nonbreaking_prefixes"; %nonbreaking_prefix = (); $language = "en"; $quiet = 0; $help = 0; while (@argv) { $_ = shift; /^-l$/ && ($language = shift, next); /^-q$/ && ($quiet = 1, next); /^-h$/ && ($help = 1, next); } if ($help) { print "usage ./split-sentences.perl (-l [en|de|...]) < textfile > splitfile\n"; exit; } if (!$quiet) { print stderr "sentence splitter v3\n"; print stderr "language: $language\n"; } $prefixfile = "$mydir/nonbreaking_prefix.$language"; #default english language if don't have language-specific prefix file if (!(-e $prefixfile)) { $prefixfile = "$mydir/nonbreaking_prefix.en"; print stderr "warning: no known abbreviations language '$language', attempting fall-back english language version...\n"; die ("error: no abbreviations files found in $mydir\n") unless (-e $prefixfile); } if (-e "$prefixfile") { open(prefix, "<:utf8", "$prefixfile"); while (<prefix>) { $item = $_; chomp($item); if (($item) && (substr($item,0,1) ne "#")) { if ($item =~ /(.*)[\s]+(\#numeric_only\#)/) { $nonbreaking_prefix{$1} = 2; } else { $nonbreaking_prefix{$item} = 1; } } } close(prefix); } ##loop text, add together lines until blank line or <p> $text = ""; while(<stdin>) { chop; if (/^<.+>$/ || /^\s*$/) { #time process block, we've nail blank or <p> &do_it_for($text,$_); print "<p>\n" if (/^\s*$/ && $text); ##if have text followed <p> $text = ""; } else { #append text, space $text .= $_. " "; } } #do leftover text &do_it_for($text,"") if $text; sub do_it_for { my($text,$markup) = @_; print &preprocess($text) if $text; print "$markup\n" if ($markup =~ /^<.+>$/); #chop($text); } sub preprocess { # clean spaces @ head , tail of each line double-spacing $text =~ s/ +/ /g; $text =~ s/\n /\n/g; $text =~ s/ \n/\n/g; $text =~ s/^ //g; $text =~ s/ $//g; #this 1 paragraph my($text) = @_; #####add sentence breaks needed##### #non-period end of sentence markers (?!) followed sentence starters. $text =~ s/([?!]) +([\'\"\(\[\¿\¡\p{ispi}]*[\p{isupper}])/$1\n$2/g; #multi-dots followed sentence starters $text =~ s/(\.[\.]+) +([\'\"\(\[\¿\¡\p{ispi}]*[\p{isupper}])/$1\n$2/g; # add together breaks sentences end sort of punctuation within quote or parenthetical , followed possible sentence starter punctuation , upper case $text =~ s/([?!\.][\ ]*[\'\"\)\]\p{ispf}]+) +([\'\"\(\[\¿\¡\p{ispi}]*[\ ]*[\p{isupper}])/$1\n$2/g; # add together breaks sentences end sort of punctuation followed sentence starter punctuation , upper case $text =~ s/([?!\.]) +([\'\"\(\[\¿\¡\p{ispi}]+[\ ]*[\p{isupper}])/$1\n$2/g; # special punctuation cases covered. check remaining periods. $word; $i; @words = split(/ /,$text); $text = ""; ($i=0;$i<(scalar(@words)-1);$i++) { if ($words[$i] =~ /([\p{isalnum}\.\-]*)([\'\"\)\]\%\p{ispf}]*)(\.+)$/) { #check if $1 known honorific , $2 empty, never break $prefix = $1; $starting_punct = $2; if($prefix && $nonbreaking_prefix{$prefix} && $nonbreaking_prefix{$prefix} == 1 && !$starting_punct) { #not breaking; } elsif ($words[$i] =~ /(\.)[\p{isupper}\-]+(\.+)$/) { #not breaking - upper case acronym } elsif($words[$i+1] =~ /^([ ]*[\'\"\(\[\¿\¡\p{ispi}]*[ ]*[\p{isupper}0-9])/) { #the next word has bunch of initial quotes, maybe space, either upper case or number $words[$i] = $words[$i]."\n" unless ($prefix && $nonbreaking_prefix{$prefix} && $nonbreaking_prefix{$prefix} == 2 && !$starting_punct && ($words[$i+1] =~ /^[0-9]+/)); #we add together homecoming these unless have numeric non-breaker , number start } } $text = $text.$words[$i]." "; } #we stopped 1 token end allow easy look-ahead. append now. $text = $text.$words[$i]; # clean spaces @ head , tail of each line double-spacing $text =~ s/ +/ /g; $text =~ s/\n /\n/g; $text =~ s/ \n/\n/g; $text =~ s/^ //g; $text =~ s/ $//g; #add trailing break $text .= "\n" unless $text =~ /\n$/; homecoming $text; }
can't locate xyz in @inc means perl unable find xyz module in @inc. means either module not installed or path not searchable, need add together path module @inc.
see below threads:
how perl's @inc constructed? (aka ways of affecting perl modules searched for?) how alter @inc find perl modules in non-standard locationsusing #perl2exe_include "unicore/heavy.pl";
solves issue because specifies path of module hence perl knows look. (perl2exe adds module executable).
exe works bit different original exe
you should share code else how'd know original , current?
perl perl-module perl2exe
No comments:
Post a Comment