Now, a University of Texas at Austin linguistics researcher, Katrin Erk, is using supercomputers to develop a new method for helping computers learn natural language. (Agencies)
Instead of hard-coding human logic or deciphering dictionaries to try to teach computers language, Erk decided to try a different tactic: feed computers a vast body of texts (which are a reflection of human knowledge) and use the implicit connections between the words to create a map of relationships.
To create a model that can accurately recreate the intuitive ability to distinguish word meaning requires a lot of text and a lot of analytical horsepower, researchers said.
The lower end for this kind of a research is a text collection of 100 million words. Erk initially conducted her research on desktop computers, but then she began using the parallel computing systems.
Access to a special Hadoop-optimised subsystem allowed Erk and her collaborators to expand the scope of their research.
Hadoop is a software architecture well suited to text analysis and the data mining of unstructured data that can also take advantage of large computer clusters, researchers said.
"We use a gigantic 10,000-dimensional space with all these different points for each word to predict paraphrases," Erk said.
Now, a University of Texas at Austin linguistics researcher, Katrin Erk, is using supercomputers to develop a new method for helping computers learn natural language.