Normally covers the items of information are a bit of a strain, but two things on a recent launched an impassioned personal appeal: I rely on playing the Civilization series and I rarely bother to read the user manual. These do not necessarily resemble topics that could be solved by computer science, but some researchers have decided to let a computer itself learn to play Freeciv and in the process, even learning to read the game manual. Simply by establishing if the movements that it was ultimately a success, not just software researchers was to better play the game, but figured out much of the manual as well.
Civilization is not the first game to capture the attention for computer scientists. The new paper, “The authors, who are based at MIT and University College London, Cité past literature, where computers have been able to teach them the Go, Poker, Scrabble, multi-player card games, and real-time strategy games. The method is used in all these is called the search in Monte Carlo.
In each possible move, the game runs a series of simulation games, which is used to evaluate the potential usefulness of the different movements. He uses it to update a utility function that estimates the value of a given movement in a specific state of the game. After several iterations, the utility function should improve the identification of the best move, although the algorithm will occasionally insert a random motion, just to keep trying new opportunities.
All this seems simple enough, but the challenges are big enough computer. The author’s estimate that the average player in general, will have 18 units in the game and each of these can be one of the 15 actions. This creates what is called an “action space” of about 1021 possible moves. To evaluate the usefulness of one of them, things were 20 moves and then check the game score (or if you won or lost before that date). It took 200 hours to generate these numbers yield.
For the tests, the Monte Carlo search was set to play Freeciv AI is based on a one-on-one game in a network of 1,000 chips. A single turn 100 game took about 1.5 hours to complete a Core i7, so all this time, the simulation is not trivial. But in general, the algorithm worked quite well, being able to win in this short time of about 17 percent of the time (from left to play a game until the end, the search for Monte Carlo won a little less than half the time).
However, the authors questioned whether the algorithm can make better decisions more consistent if he had access to the owner’s manual, which contains several bits of advice on the strengths and weaknesses of different units, as well as some general tips on how to build an empire (cities near post near a river, for example). So they decided to make your program RTFM.
“Reading” was held using a neural network that takes the game state, a proposed move, and the owner’s manual input. A set of neurons in the network analysis to find the manual condition / action pair. These pairs are things like “active” or “full path” (he says) and “improve the ground” or “strengthen unity,” such as stocks. A neural network separate to understand if one of the points identified for the first applied to the current situation. These are then combined to find relevant advice in the manual, which is then incorporated into the utility function.
The key to this process is that the neural network does not even know if it is correct to identify the pair status / action when it starts, it does not know how to “read” much less whether it has correctly interpreted the advice , the intermediary (if you choose to build near a river, or you have never built a river?). Everything must go on what the impact of his performance on the match result. In short, he must find a way to read the instruction book just by trying different interpretations and see if they improve the game.
Despite the difficulties, it works. When analyzing text were included, the success of software authors have taken so far won over half his games in 100 moves, and hit the game’s AI nearly 80 percent of the time when games were played at the end.
To test how well the software, the authors, is powered by a mixture of phrases owner’s manual, and took the Wall Street Journal site. The software used phrases correctly more than 90 percent of the time early in the game. However, as the game progressed, the manual has become a less useful guide, and the ability to withdraw her hand fell to about 60 percent of the End Game. At the same time, the software has begun to rely on less tangible, and the gaming experience.
This does not mean that the Journal was useless, however. Feeding the complete software package random text instead of an instruction also strengthened their algorithm is winning percentage; it increases by 40 percent in the 100-Move games. It is not as good as 54 percent obtained with the manual, but it is much better than the rate of 17 percent to win the algorithm alone.
What is happening here? The document does not say, but more importantly to note is that the neural network is just to try to identify rules that work (ie, built near a river). No matter how these rules will be transmitted only the text associated with a random action and determines whether the results are good. If you’re lucky, you can end combination of a useful rule with random text. He has a better chance than with non-random bits of text as the owner’s manual, but can still provide useful information, no matter what it takes to work.
The authors note that the software successfully learned to build its presence in the rich language of the game manual to perform better, learn to interpret the language as it went along. This is certainly true, the best software hoitaisi when he received the user manual as it was fed with a random text, and the difference was statistically significant. But simply providing a text resulted in a higher relative speed. This means that it is better to have some rules to work, no matter how they are derived, which have no instructions at all.








