One inter-studio

The Jiu Jiu education: "comprehensive" management Middle sim city 5 School, such as

Feng Zhiwei

Back in 1928, L. Hartley (Hartley) on how to measure the amount of information the size of the problem. He considered that if a device there are D possible positions or physical state, then, two such apparatus in combination work would have been the state of the D 2, D 3 state for three such means combining the work will With the increase in the number of the device, the possible states of the entire system trees also be increased accordingly. In order to determine its information capacity, the ability to make the 2D devices precisely sim city 5 2 times the capacity of the D device. Therefore, Hartley the information capacity of an apparatus is defined as logD, where, D is the number of states of the entire system can enter different.

Third, if we also do contain a composite of the two randomized trials test, a randomized trial of m possible sim city 5 outcome, another randomized trial, there are n possible outcomes (e.g., cast coins, m = 2; craps, n = 6), then this composite test m n possible such as the probability of the outcome, that is, the entropy of the composite test should be equal to log 2 mn the other hand, we can believe that the entropy of the composite outcomes of trials and should be equal to two randomized trials constituting this composite test outcome entropy, that is equal to log 2 m + log 2 n. However, we know that

1951, Shannon should first calculate the entropy H of the the English letters ranging probability independent chain of 4.03 bits.

Ma 1 1/2 horse 51/64

2 1/4 horse horse 61/64

Horse 31/8 horse 71/64

MA 41/16 MA 81/64

Priori probability and entropy of the horse is closely related to the concept of "perplexity" (perplexity). If we take the entropy H as an exponent of 2, then 2 h this value is called the perplexity. Intuitively, we can understand the perplexity in the weighted average number of to select random variables randomized trial. Therefore, in the equal probability estimates between 8 horses to be selected (in this case, the entropy H = 3 bits), confusion degree of 23, which is 8. Be selected (in this case, the entropy H = 2 bits) between the difference in the probability of the eight horses, perplexity is 22, that is, 4. Obviously, the greater the entropy of a randomized trial, its perplexity greater.

The actual test is designed to: to test people to see an English text, and then ask the subjects to guess the next letter. sim city 5 The subjects used their knowledge of the language to guess the letters are most likely to occur, and then the next most likely guess the letters, and so on. We recorded the number of human subjects guessed. Shannon pointed sim city 5 out, guess the number of sequence entropy and entropy of the limits of the English alphabet sim city 5 is the same. Shannon intuitive interpretation of this view is: If subjects do n guessing, given to guess the number sequence, we were able to reconstruct the original text by selecting the method of the n-th letter most likely. This method requires guessing letters instead sim city 5 of guessing words, sim city 5 the test is sometimes necessary to conduct an exhaustive search sim city 5 of all the letters! , Shannon calculated the entropy of the limits of each letter in the English, rather than the limit entropy sim city 5 of each word in the English. He reported the result is the: English letters ultimate entropy is 1.3 bits (27 letters 26 letters plus blank]). sim city 5 Shannon's valuation is too low, because he is under a single article text (Dumas Malose "Jefferson the Virginian") test. Shannon also noted that other text (news reports, scientific writings, poetry), his subjects often guesses wrong, therefore the entropy this time is relatively high.

Then, they use the word triples syntax to assign a probability to the Brown corpus, the corpus as a sequence of letters, so as to calculate the the Brown Corpus character entropy. sim city 5 They get the result is a: each character limit entropy of 1.75 bits (here the character set contains 95 printable ASCII characters). sim city 5 This is the conditional entropy of the English alphabet in the case of ternary syntax. Obviously, the conditional entropy sim city 5 measure than the Shannon entropy 1.3 bit larger, and Brown used character set is the ASCII character set contains 95 characters, a lot of character sim city 5 beyond the boundaries of the 26 letters of the English.

Feng Zhiwei imitate Shannon in the 1970s, to manually check the frequency of the first estimate of Chinese characters entropy H 1 9.65 bits for the entropy of English letters, and proposed the "kanji capacity limit theorem. Zipf's law, the use of mathematical methods to prove when the capacity of the Chinese characters in the statistical sample is not included in a character entropy H 1 increases with the increase in the capacity of the Chinese characters to reach 12,366 words, when the capacity of the Chinese sim city 5 characters in the statistical sample The entropy H 1 when contained in a character is no longer increases, which means 1 when the determination of the entropy of the characters sim city 5 H statistical sample, sim city 5 the capacity of Chinese characters there is a limit. This limit is 12,366 words, beyond this limit, the entropy of the measure of Chinese characters and did not increase the 12,366 sim city 5 Chinese characters, over 4000 commonly used words, more than 4000 times used the word more than 4,000 a rarely used word. He believes that 12,366 Chinese sim city 5 characters can represent the basic outlook of the Chinese characters of ancient and modern literature. Thus, he concluded: from the written Chinese language as a whole to consider, all written Chinese (including sim city 5 modern Chinese and ancient Chinese), contains a character entropy H 1 9.65 bits. Since then, Feng Zhiwei no conditions to use the computer to check the frequency, all the work is done by hand, the accuracy is difficult to be assured. sim city 5 So, Feng Zhiwei always believed that this is just a very immature guess.

Entropy is a measure of the amount of information in natural language processing, entropy is used to portray language sim city 5 mathematical outlook valuable data. Entropy can be used as a measure of the amount of information in a particular syntax, metrics given language syntax and given how high the degree of matching predict a given N-gram in the next word is. If two given grammar and a corpus, we can use the entropy to estimate which syntax and Corpus match better. The degree of difficulty, we can also use the entropy to compare two speech recognition tasks can also use it to measure the probability sim city 5 of a given syntax with human syntax matching degree.

Nickname:

## No comments:

## Post a Comment