NLTK库中提供了一些用于评估文本可读性的方法。下面是一个简单的示例代码,演示如何使用NLTK库中的textstat模块来评估文本的可读性:
import nltkfrom nltk.tokenize import word_tokenizefrom nltk.corpus import stopwordsfrom nltk.text import Textfrom textstat.textstat import textstat# 载入文本text = "This is a sample text to test readability using NLTK library."# 分词tokens = word_tokenize(text)# 去除停用词stop_words = set(stopwords.words('english'))filtered_tokens = [word for word in tokens if word.lower() not in stop_words]# 创建NLTK文本对象text_nltk = Text(filtered_tokens)# 计算文本可读性指标flesch_reading_ease = textstat.flesch_reading_ease(text)automated_readability_index = textstat.automated_readability_index(text)coleman_liau_index = textstat.coleman_liau_index(text)# 打印结果print("Flesch Reading Ease Score:", flesch_reading_ease)print("Automated Readability Index:", automated_readability_index)print("Coleman-Liau Index:", coleman_liau_index)运行上述代码后,将输出文本的Flesch Reading Ease Score(弗莱施阅读易度分数)、Automated Readability Index(自动可读性指数)和Coleman-Liau Index(科尔曼-利奥指数)等可读性指标。根据这些指标的数值,可以评估文本的可读性水平。


