Welcome to Journal of Beijing Institute of Technology
Volume 15Issue 2
.
Turn off MathJax
Article Contents
ZHANG Feng, FAN Xiao-zhong, XU Yun. Chinese Term Extraction Based on PAT Tree[J]. JOURNAL OF BEIJING INSTITUTE OF TECHNOLOGY, 2006, 15(2): 162-166.
Citation: ZHANG Feng, FAN Xiao-zhong, XU Yun. Chinese Term Extraction Based on PAT Tree[J].JOURNAL OF BEIJING INSTITUTE OF TECHNOLOGY, 2006, 15(2): 162-166.

Chinese Term Extraction Based on PAT Tree

  • Received Date:2004-11-15
  • A new method of automatic Chinese term extraction is proposed based on Patricia (PAT) tree. Mutual information is calculated based on prefix searching in PAT tree of domain corpus to estimate the internal associative strength between Chinese characters in a string. It can improve the speed of term candidate extraction largely compared with methods based on domain corpus directly. Common collocation suffix, prefix bank are constructed and term part of speech (POS) composing rules are summarized to improve the precision of term extraction. Experiment results show that the F-measure is 74.97%.
  • loading
  • [1]
    Pant e Patr ickl, Lin Dekang. A statist ical cor pus basedterm extractor [A]. Pr oc of AI 2001 [c]. Ottaw a,Canada: Springer Verlag, 2001. 36-46.
    [2]
    Luo Sheng fen, Sun Maosong. Two char acter Chineseword extraction based on hybr id of internal and contextu al measures [Z]. The Second SIGHAN Workshop onChinese Languag e Pr ocessing, Sapporo, Japan, 2003.
    [3]
    Hong Munpyo, Fissaha Sisay, Haller Johann. Hybr id fil tering for extraction of term candidates from Germantechnical texts [Z]. Terminology and Artificial Intelli gence, Nancy, 2001.
    [4]
    Maynard D, Ananiadou S. Terminolog ical acquaintance:t he importance of contex tual information in terminology[Z]. NLP2000 Workshop on Computational Terminolo g y for Medical and Biological Applications, Patras,Greece, 2000.
    [5]
    Ong T hian Huat, Chen Hsinchun. Updateable PAT treeapproach to Chinese key phrase ex traction using mutualinformation: A ling uistic foundation for knowledg e man agement [Z]. The Second Asian Dig ital Libr ar y Confer ence, Taipei, Taiw an, 1999.
    [6]
    Gonnet G H, Baeza yates R A, Sinder T. New indicesfor tex t: PAT tr ees and PAT arrays [A]. I nformationr etr iev al: Data structur es and algorit hms [c]. New Jer sey: Prent ice Hall, 1992. 66-82.
    [7]
    Mor rison D R. PATRICIA? Practical algorithm to re trieve informat ion coded in alphanumeric [J]. Journal oft he Association for Computing Machiner y, 1968, 15(4): 514-534.
    [8]
    Mander U, Baeza Yates R. An algor ithm for stringmatching w ith a sequence of doesn# t cares [J]. Informa tion Processing Letters, 1991, 37(3): 133-136.
    [9]
    Zhao Ju, Huang Changning. A transformation basedmodel for Chinese basedNP recog nitio n [J]. Journal ofChinese Information Processing, 1998, 13(2): 1-8.(in Chinese)
    [10]
    Yu Shiw en, Chang Baobao, Zhan Weidong. An intro duction to computational linguistics [M]. Beijing: TheCommercial Press, 2003. (in Chinese)
  • 加载中

Catalog

    通讯作者:陈斌, bchen63@163.com
    • 1.

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (217) PDF downloads(0) Cited by()
    Proportional views
    Related

    /

      Return
      Return
        Baidu
        map