Sunday, 15 May 2011

python - nltk NER word extraction -



python - nltk NER word extraction -

i have checked previous related threads, did not solve issue. have written code ner text.

text = "stallone jason's cinema rocky inducted national cinema registry having cinema props placed in smithsonian museum." tokenized = nltk.word_tokenize(text) tagged = nltk.pos_tag(tokenized) namedent = nltk.ne_chunk(tagged, binary = true) print namedent namedent = nltk.ne_chunk(tagged, binary = false)

which gives short of result

(s (ne stallone/nnp) jason/nn 's/pos film/nn (ne rocky/nnp) was/vbd inducted/vbn into/in the/dt (ne national/nnp film/nnp registry/nnp) as/in well/rb as/in having/vbg its/prp$ film/nn props/nns placed/vbn in/in the/dt (ne smithsonian/nnp museum/nnp) ./.)

while expect ne result, like

stallone rockey national cinema registry smithsonian museum

how accomplish this?

update

result = ' '.join([y[0] y in x.leaves()]) x in namedent.subtrees() if x.node == "ne" print result

gives syntext error, right way write this?

update2

text = "stallone jason's cinema rocky inducted national cinema registry having cinema props placed in smithsonian museum."

tokenized = nltk.word_tokenize(text) tagged = nltk.pos_tag(tokenized) namedent = nltk.ne_chunk(tagged, binary = true) print namedent np = [' '.join([y[0] y in x.leaves()]) x in namedent.subtrees() if x.node == "ne"] print np

error:

np = [' '.join([y[0] y in x.leaves()]) x in namedent.subtrees() if x.node == "ne"] file "/usr/local/lib/python2.7/dist-packages/nltk/tree.py", line 198, in _get_node raise notimplementederror("use label() access node label.") notimplementederror: utilize label() access node label.

so tried

np = [' '.join([y[0] y in x.leaves()]) x in namedent.subtrees() if x.label() == "ne"]

which gives emtpy result

the namedent returned tree object subclass of list. can next parse it:

[' '.join([y[0] y in x.leaves()]) x in namedent.subtrees() if x.node == "ne"]

output:

['stallone', 'rocky', 'national cinema registry', 'smithsonian museum']

the binary flag set true indicate whether subtree ne or not, need above. when set false give more info whether ne organization, person etc. reason, result flag on , off don't seem agree 1 another.

python regex nlp nltk

No comments:

Post a Comment