Welcome to Journal of Beijing Institute of Technology
Volume 12Issue 4
.
Turn off MathJax
Article Contents
ZHANG Yang-sen, CAO Yuan-da. Statistical Language Model for Chinese Text Proofreading[J]. JOURNAL OF BEIJING INSTITUTE OF TECHNOLOGY, 2003, 12(4): 441-445.
Citation: ZHANG Yang-sen, CAO Yuan-da. Statistical Language Model for Chinese Text Proofreading[J].JOURNAL OF BEIJING INSTITUTE OF TECHNOLOGY, 2003, 12(4): 441-445.

Statistical Language Model for Chinese Text Proofreading

Funds:theYouthFundofScienceandTechnologyofShanxiProvince(20021015)
  • Received Date:2003-05-30
  • Statistical language modeling techniques are investigated so as to construct a language model for Chinese text proofreading. After the defects of n-gram model are analyzed, a novel statistical language model for Chinese text proofreading is proposed. This model takes full account of the information located before and after the target word wi, and the relationship between un-neighboring words w iand w jin linguistic environment(LE). First, the word association degree between w iand w jis defined by using the distance-weighted factor, w jis l words apart from w iin the LE, then Bayes formula is used to calculate the LE related degree of word w i, and lastly, the LE related degree is taken as criterion to predict the reasonability of word w ithat appears in context. Comparing the proposed model with the traditional n-gram in a Chinese text automatic error detection system, the experiments results show that the error detection recall rate and precision rate of the system have been improved.
  • loading
  • [1]
    Zhang L ei,Zhou M ing,Huang Chang ning,et al.M ultifeature based approach to automatic error detection andcorrection of Chinese text[A] .M icrosoft Resear ch China.M icr osoft Research China Paper Collection,V ol.1[M] .Beijing:Inter net M edia Group,2000.193-197.
    [2]
    Zhang Y angsen,Cao Yuanda,Xu Bo.T he languagemodels and their comparison study in NL P[J] .Journal ofG uang xi Normal U niversity,2003,21(1):16-24.(inChinese)
    [3]
    Niesler T R,Woddland P C.V ariable length categ ory ng ram languag e mo dels[J] .Computer Speech and L anguage,1999,13(1):99-124.
    [4]
    G ao Jianfeng,L ee K ai Fu.Distr ibution based pruning ofbackoff languag e models[A] .M icrosoft Research China.M icroso ft R esearch China Paper Collection,Vol.1[M] .Beijing:Internet M edia Group,2000.120-126.
    [5]
    L i Jianhua,W ang X iaolong,W ang Ping,et al.T he research of mult i featur e chinese tex t proofreading algorithms[J] .Co mputer Engineer ing&Science,2001,23(3):93-96.(in Chinese)
    [6]
    Zhang Y angsen,Ding Bing qing.Research and pr acticeon the lexical er ror detecting system based on(bandingand filtering?in Chinese tex t automatic proofreading[A] .Huang Changning.Proceedings 1998 I nternatio nal Conference on Chinese Infor mation Processing[C] .Beijing:T singhua U niversity Press,1998.392-437.(in Chinese)
  • 加载中

Catalog

    通讯作者:陈斌, bchen63@163.com
    • 1.

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (219) PDF downloads(0) Cited by()
    Proportional views
    Related

    /

      Return
      Return
        Baidu
        map