You are viewing a single comment's thread from:

RE: Century Old Mystery of The Zipf's Law (And A Cool Experiment)

in #steemstem7 years ago (edited)

I think a Zipf analysis of word counts for a language could be useful when attempting to implement an artificial intelligence with semantic meaning. The Zipf distribution gives an insight into what a particular culture thinks are its most important words/concepts, as viewed by the culture using that language.

For example, many languages have a particular word that represents the concept of a "wall." However, at least one that I know of uses two different words to distinguish from "inside wall" and "outside wall." For the culture using that language, the distinction was important, for some reason.

Language is more than a means of communication. It provides a way of thinking and solving problems, especially those problems that require more than one individual to solve.

Let's look at your first list of important words, by decreasing frequency: The Of And To A, In Is I That, It For You Was, With On As Have, But Be They. From this I would hypothesize that this culture (the one using English words) collectively thinks that it is very important to represent the distinction between the definite ("the") and the indefinite ("a," "an). So if I were to implement an AI that actually understood semantics, I would want it to learn the following as general concepts:

  • Definite vs. indefinite: the, a, an
  • Association based on ownership: of, for, have
  • Creating or adding to a collective: and, with
  • Subtracting from a collective: but, except
  • Location of objects: to, toward, at, on
  • Hierarchies based on containership: in, inside, out, outside
  • Association based on equivalence or relevance: is, as, am, be
  • Abstractions representing people and things: I, you, they, it

Current state-of-the-art cannot do this. Hinton thought vectors (from NLP research) tend to group words in close "proximity" within the network based upon whether statistical usage permits one word to be substituted for another, such that the result is still a sentence that has "some" meaning. But there is no guarantee that the result of the substitution has the same meaning.

Sort:  

Thank you for the insights. Yes your take on this could open interesting doors, particularly with the crazy amount of AI progress we are having now.
It might already be in the works somewhere, but if not, you might have an amazing idea to proceed with.
Thanks again!

You're welcome, and thanks for the kind response. I really enjoyed reading your article. It was very well written. I almost forgot ... I wanted to throw @originalworks at this to snag another upvote for you. ;)

Coin Marketplace

STEEM 0.17
TRX 0.13
JST 0.027
BTC 59538.61
ETH 2658.79
USDT 1.00
SBD 2.45