Virtually Embodied Conversational Agent: January 2011

Saturday, January 29, 2011

Fleshing Out My Algorithm

Starting to write out some code helped ground my concepts a bit. The following is a more detailed description of the basic planning algorithm. It's starting to deviate from Spreading Activation because of the gates and because activation passes information as it goes. With these plans in place I'm going to go back to coding and make changes as I go.

Directed Graph – contains nodes and directed edges

Node – contains activations. Types of nodes are as follows:

Concept – a Node that contains a concept, such as “color” or “coffee”
Gate – a Node that takes in one or more activations and outputs activation
Action – a Node that contains an action to be performed, as well as a cost function for that action

o Actions can be designated as goals

Instance – a Node that is a clone of a concept with attributes able to be filled in
Parameter slot – a Node to be filled in with the appropriate parameter from an activation
Root – a generic Node that serves as the parent for all Instances in the current environment (working memory)

Edge – contains a weight to constrain search through spreading activation decay and a distance to localize inference about concepts

Inhibitor/Activator – an Edge where activation directly affects a Node’s activation or cost function. Under this algorithm, inhibition/activation is only supported if at least one of the connected nodes is a conceptual node, since they would be activated or inhibited by a reasoning process or verbal input. For example, an inhibitor from an action node to other action nodes would make those possible actions less likely simply by considering the possibility of the inhibiting action, which is not desired. To encode behavior that makes other actions more difficult, we would have to store all possible paths and compare combinations of them to find the shortest path.

o Example: Assume a café is closed on Sundays (I never said it was a successful café). On Sundays, the current state of the environment would activate “Sunday”, which would activate the “Closed” state node of the café. “Closed” would inhibit the action “Buy coffee” that would usually be activated as a possible action under the goal of “Get coffee”.

Goal generator – activation along this edge makes the target action node a goal, and initiates the spreading algorithm
Conceptual Edges

o Is-A – an Edge that represents an IS-A relationship between two concepts

o Attribute – an Edge that represents that one concept is an attribute of another

o Instance-Of – an Edge that represents that one concept is an instance of a more abstract concept. Provides the link between long term memory (conceptual) and working memory (instance based)

Parameter – the target node is a parameter

Parameters- Parameters are passed through spreading activation to fulfill action or concept nodes that accept them. Parameters can be instances or concepts. Preconditions can be dependent on a particular parameter, so that a node is only activated by an activation carrying the requisite parameters. Parameters are passed from one node to the next until they reach the end of the activation.

Activation – consists of an activation value and a source. Activation spreads along the graph to activate connected nodes, with a falloff dictated by edge weights. Different activations have different impedances for various edge types. High impedance means the activation will not spread easily over that edge, whereas low impedance means it will easily spread. Activation impedance is also affected by the existing activation of a Node – activated nodes will pass on activations more readily.

Current State Activation – the activation with a source at the root state

o Travels along forward edges

o Low impedance edges:

§ Preconditions

§ Instances

o High impedance edges:

§ Attributes

Goal Activation – the activation with a source at a goal state

o Travels along backward edges

o Low impedance edges:

§ Preconditions

§ Instances

o High impedance edges:

§ Attributes

Knowledge Query Activation (asking about concepts)

o Travels bi-directionally

o Low impedance edges

§ Is-A

§ Attributes

o High impedance

§ Preconditions

§ Instances

Bi-directional information spreading algorithm:

A goal state is given as a command or a desired action, and a root state represents the current state of the environment. The algorithm will find a low cost sequence of actions that will satisfy the goal condition without an exhaustive search of all possibilities.

Goal activation starts at the given goal nodes. This spreads across incident (backward) precondition edges and bi-directionally across conceptual edges and defines a goal set – the set of goals and preconditions that need to be completed by the agent.
Current state activation starts at the root node. This spreads across forward precondition and inhibitor/activator edges and bi-directionally across conceptual edges.
The series of strongest activated precondition node forms the sequence of actions to be taken by the agent. A backward greedy search from the goal state forms a graph consisting of nodes with the greatest activation (AND gates will traverse all incident paths). The actions will be performed according to a depth-first traversal starting from the root node. AND gates will stop the traversal until all paths have reached that gate.

Additional thoughts:

Looping – looping is important for actions that need to be repeated a number of times, as well as standing orders, such as, “Tell me if you see a red car” (thanks for reminding me Norm :) ). This can be controlled using a Loop Gate, with an input for activation, one output to direct the activation back to the beginning of the “block” (in this case an action node), and one output that activates when the loop is finished. Another possibility is to reroute activation back to a Goal Generator node. To keep repetitions reasonable, I would probably need a “refractory period” implemented using a delay between activations.

Language - I haven't mentioned much about language so far, but it's always in the back of my mind. To start, I'm going to use preplanned phrases to build the network and have the program ask about anything needed to complete a task. I hope later to use activation of concepts combined with semantic frames to improve the language aspect.

Thursday, January 27, 2011

Clarification of my Spreading Activation Algorithm

I haven't found much in the way of detailed spreading activation algorithms, aside from the "Task Planning Under Uncertainty Using a Spreading Activation Network" paper. I've also found that many details of the implementation are dependent on the application. In my model, I've realized the need for a few things that complicate the simple spreading mechanism.

First of all, the spreading activation model I'm using is bi-directional. One activation from the source, which represents the current state of the environment and all possible actions from that state, and the goal, which is the command that should be fulfilled. The source activation travels along the outgoing edges, and the goal activation travels backward across incident edges. When they meet, I'll have a possible sequence of actions to go from the current state to the goal state. However, the first meet of the two activations would only be optimal if all the edges were equally weighted (they aren't in this case). That's complication #1, and so in my case the activation will continue until all of the activations have decayed past a certain threshold.

Edge decays range from 1.0, representing an effortless connection or action (like breathing), to 0, representing an impossible connection or action (like pulling my cat off a carpet). An activation is some empirically determined value, and represents the initiative the agent has in making the connection or achieving the goal. It can be thought of as a kind of radius that spreads outward from the goal and current state nodes.

The backwards propagation establishes the goal set - all the preconditions that are part of possible solutions to the problem. This can either simply be a flat structure of all possible preconditions, or it could be organized as a dependency graph to indicate possible sequential orderings of actions. The forward propagation then fills in the weights of the preconditions to find the best sequence of steps. I'm still working on the design of this part of the implementation, and it will be the main focus of this week's work.

The next complication is differentiating between AND and OR when it comes to preconditions and other relationships. If we assign a cost to a particular node in the graph corresponding to an action, say, "Make coffee", the cost will be some combination of all the requirements. So we need to get water, get coffee grounds (or get a grinder and beans for purists), and turn the coffee maker on. If any one of these tasks is particularly difficult for some reason, then the entire task will be difficult - this is an AND grouping of preconditions. Furthermore, if there are many tasks to be done, the difficulty increases.

To represent this, I'm going to say that the activation of an AND gate is the multiplication of the incident activations. This creates the effect that many effortless actions are still effortless, but even a small number of difficult actions make the action difficult.

For OR, it's the opposite case. If "Get coffee" can be satisfied by either "Making coffee" or "Buying coffee", then it doesn't matter how difficult the more difficult task is, it only matters how easy the easy task is. We'll take the path of least resistance - therefore the activation of an OR gate is the highest of the incident edges.

These two gates are essential, but other gates can be added with their own properties. If you've taken any intro Neuroscience class, you'll notice that these gates simulate the connections between neurons. This analogue will prove useful for other features as the project progresses as well.

I've decided to switch to C# from C++ after talking with Ben. This will make programming faster, and it also means I can easily use Unity for my virtual environment.

Tonight I finished my graph data structure and have some debugging info displayed. It's only rudimentary, and doesn't incorporate any of the features I've mentioned here yet.

Friday, January 21, 2011

Project Proposal

Natural language is the most common form of communication for us, but because of its complexity, it is rarely used for human-computer interfaces. However, there are many advantages to natural language interfaces - unlike code, everyone can already communicate with it, and it improves the user experience by making it seem as if a person is interacting with the user rather than a computer.

Furthermore, there are advantages to using language not only to communicate, but also to store knowledge. Storing knowledge in natural language makes it natural to produce sentences in response to the user.

My project is an attempt to combine these two concepts in a virtually embodied agent. A virtually embodied agent is a program that can manipulate and respond to changes in a virtual environment, such as a game or simulation. A virtually embodied conversational agent is an agent that can respond to natural language input from a user.

Abstract:

Intelligent Virtual Agent cognitive models often use a series of abstractions to split different tasks into manageable and solvable problems. For example, language is translated from a sentence to a parse tree, and then to a semantic representation. The semantic representation is then used with a knowledge base to transform the semantics into a temporal logic, and then the logic is transformed into statements which can be evaluated. However, such a pipeline has limitations because each of the constituent parts could aid in evaluating other parts for pronoun reference, disambiguation, prepositions, and pragmatics, yet are kept separate in a pipeline model.

I propose a cognitive model that consists of a cross between a semantic spreading activation network and finite state machine, which is embodied in a virtual world by means of callback functions expressed as nodes in the network. Each node in this network represents a concept that is mapped to other nodes with a relationship. This system allows for conceptual relationships found in a semantic network to coexist with and fill in the information needed for the functional callback nodes associated with particular actions. Gates are used to control shortest path and spreading activation calculations when nodes are queried. Learning can take place through the addition of connections either from language input or through automatic learning (such as Long-Term Potentiation - adding connections between nodes that activate together). The FSM aspect is used to model sequences of actions while maintaining conceptual information at each step of the process.

FinalProjectProposal