Tutorial 1

From Olympus
Jump to: navigation, search

MyBus: a simple bus schedule information system

Contents

Introduction

So this is it. You'll be writing your first Olympus dialogue system! To be honest it's not a "spoken" dialogue system yet since, at this point, we won't look into speech synthesis and recognition. Instead, we'll build the dialogue system version of "Hello World!", which is a text-based system that provides (hypothetical) bus schedule information. At the end of this tutorial, you will have a system with which a typical dialogue looks like:

 S:  Welcome to MyBus.
     Where are you leaving from?
 U:  DOWNTOWN
 S:  Where are you going?
 U:  THE AIRPORT
 S:  Let me check that for you.
     There is a 28X leaving DOWNTOWN at 4:20 p.m. It will arrive at THE AIRPORT at 4:56 p.m.
     You can say, when is the next bus, when is the previous bus, start a new query, or goodbye.
 U:  WHEN IS THE NEXT BUS
 S:  Okay.
     There is a 28X leaving DOWNTOWN at 7:03 p.m. It will arrive at THE AIRPORT at 7:37 p.m.
 U:  GOODBYE
 S:  Thank you for using MyBus. Goodbye!

That might seem simple but you'll already have a lot to learn about the fundamentals of dialogue systems and of Olympus itself. The three main tasks that a dialogue system has to perform are:

  1. to understand what the user says/types (and in multimodal systems, the user's gaze, gestures, etc)
  2. to plan an appropriate response
  3. to generate natural language to express the response to the user (again, in multimodal systems, this might also include animating a character, displaying particular graphics, etc)

In Olympus, these three functions are performed by:

  1. Phoenix: a robust parser using context-free grammars
  2. RavenClaw: a framework to build dialogue managers
  3. Rosetta: a template-based natural language generation

In this tutorial we will learn how to write a Phoenix grammar, a RavenClaw task specification, and a Rosetta set of templates to build a system that conducts dialogues like the one above.

Building the System

Build the system by double-clicking SystemBuild.pl.

Writing the grammar

The role of the grammar (and the parser) is to extract the structure from a string of words that expresses natural language (it could be a sentence but it does not need to be "grammatical" in the usual linguistic sense of the word), and convert it into a formal representation that the other components of the system can deal with. Basically, the grammar abstracts away the ambiguity and redundancy of natural language. For example in the dialogue above, instead of typing "WHEN IS THE NEXT BUS", a user could also type "THE NEXT BUS" or "WHEN IS THE NEXT ONE" or "WHEN IS THE ONE AFTER THAT", which would all mean the same thing. In general, we do not want to base our answer to the question on whether the user typed (or said) one or the other, all we want to know is that they expressed the wish to get the schedule of the next bus. This is what the parser does.

Hence, the grammar describes all the sentences that the system can understand, along with their meaning. One way to do so would be to list all these sentences. Indeed this might be enough for the toy system we are considering here. However, for systems that allow slightly more flexibility from the user (such as the one we will build in our next tutorial), the number of possible sentences is so huge that it becomes impossible to list them all. Therefore, grammar formalisms such as the one used by Phoenix, offer a way of representing thousands or millions of sentences in a compact, readable way. Let's look at the grammar for our system. It is found in the file "Resources/Grammar/MyBusTask.gra". The format of the grammar file is described in the Phoenix Grammar Reference. The first slot definition we see in the file is:

 [Place]
   (carnegie mellon university)
   (downtown)
   (robinson towne center)
   (the airport)
   (south hills junction)
   (mount oliver)
   (the south side)
   (oakland)
   (bloomfield)
   (polish hill)
   (the strip district)
   (the north side)
 ;

All this says is that the slot [Place] matches any of twelve expressions listed below. Nothing fancy here. More interesting is the next one:

 [NextBus]
   (*WHEN_IS *the next *BUS)
   (*WHEN_IS *the BUS after that *BUS)
 
 WHEN_IS
   (when is)
   (when's)
 
 BUS
   (bus)
   (one)	
 ;

According to our explanations above, this definition can match sentences like "when is the next bus", "when's the one after that", "next bus", or "next" (and many other variations). It's using two variables: WHEN_IS and BUS which capture local variations in language, and has lots of optional words and variables. We'll let you look into the rest of the grammar file and figure out if you understand what each slot matches. You can even add you own variations if you feel like it. As you can see from [Yes] and [No], slot definitions can get as complicated as you want them to be...

The second file that describes our grammar is the forms definition file: "Resources/Grammar/MyBusTask.forms". Basically, this file lists all the top level slots and groups them by function. However, currently Olympus does not use the notion of functions so it in fact does not matter which FUNCTION you put your top level slots in. Note that this grammar is so simple that all slots are top level (i.e. you never see a slot name in a parenthesized expression). We'll see example of a more complex and structured grammar in the next tutorial.

Okay... Now that we've played with the grammar, how do we know if it works and what it does? Of course, one answer is "you'll know once you have a full working system" but you don't want to rerun (or write) a whole system each time you change your grammar do you? So there's a small command line tool that allows you to test your grammar. First compile your grammar by running the "cmp.bat" script. Then run the "prs.bat" script. This opens a command line that allows you to type any sentence you want and see the result of the parse. Without going into details, you'll see the names of the slots and functions that the parser extracts from your input thanks to the grammar. Type "quit" to exit the prs.bat script.

Writing the task specification

The Task Tree

Now that we have a grammar, we can understand what the user types into the system. But how do we know what to respond? This is the job of the dialogue manager. In Olympus, the dialogue manager is based on RavenClaw, which is a generic engine that "runs" a dialogue according to a task specification, which is a description of what dialogues for your particular task unfold. So for each new system (i.e. each new task), all you have to do is write a task specification, and RavenClaw will take care of running the dialogues according to it (and of course according to the user's responses too). How do you describe a task? As for the grammar, one solution would be to list all possible dialogues that could ever occur. Of course, as for the grammar, this is unreasonable since even for simple systems there is a huge number of possible dialogues. So we need a framework to describe dialogues in a compact way, the same way we needed a grammar to describe language in a compact way. Task specification in RavenClaw uses the hierarchical structure of dialogue to allow this. Basically, all dialogues to achieve a particular task can be decomposed into subdialogues, and such subdialogues can in turn be decomposed. For example, in our bus schedule example, you might want to decompose the general task of providing bus schedule into:

  1. Greeting the user
  2. Responding to the user's request
  3. Saying goodbye to the user

In addition, the middle task of "responding to the user's request" can be decomposed into:

  1. Getting the user's request
  2. Getting the information corresponding to the request
  3. Providing the information to the user

Each of these subtasks can then be further decomposed, and so on. This decomposition can be represented as a tree where the root is the whole task, each node represent a subtask, and children of a particular node represent the subtasks in which it decomposes. We call this tree the task tree. Nodes of the tree are called "agents" because they are in charge of performing a particular subdialogue.

The MyBus task tree

The figure above shows the task tree for our MyBus system. As you can see from the key, there are five types of nodes (agents) in the tree, one type of internal node and four types of leaf nodes:

  • Agencies are the internal nodes of the tree. They represent (sub)tasks that are further decomposed.
  • Inform Agents represent the atomic behavior (i.e. an action that is not further decomposed) of telling something to the user
  • Request Agents represent the atomic behavior of asking a question to the user and understanding the answer
  • Execute Agents represent atomic actions that are not conversational in nature, such as looking up a database to retrieve results
  • Expect Agents do not represent any action but rather capture specific concepts that the user might say. They therefore perform the "understanding" part of the Request Agents but do not ask a question (we'll see later when and how that can be used)

As you can see, this tree reflects the decomposition that we described above: the root (MyBus) is decomposed into three subtasks: GiveIntroduction, PerformTask, and GreetGoodbye. Then PerformTask itself is decomposed into GetQuerySpecs, ProcessQuery, and GiveResults. Further, we see that GetQuerySpecs is decomposed into two Request Agents: RequestDeparturePlace and RequestArrivalPlace, whose function you can guess by their name.

Traversing the tree

As you might have figured out by now, the default way a dialogue unfolds given a task tree is by what is called "depth-first left-right traversal". This means that, when traversing the tree:

  1. Agencies gets immediately traversed towards their children agents
  2. The children agents are traversed from left to right

This means that the first thing our system is going to do is GiveIntroduction:

 S: Welcome to MyBus.

Next, it will traverse PerformTask to GetQuerySpecs to RequestDeparturePlace:

 S: Where are you leaving from?
 U: DOWNTOWN

Once it gets the answer from the user, the system will continue to RequestArrivalPlace:

 S: Where are you going?
 U: THE AIRPORT

Then, since all the children of GetQuerySpecs have been traversed, the system will go to ProcessQuery, which will lead to InformFirstProcessing:

 S: Let me get that for you.

The role of this message is to make the user wait while the system is hitting the database to retrieve the results. Then, if we went on like this, it would go to InformSubsequentProcessing (which is a lighter form of the latter, like "Okay", intended to be used when the user asks for further information after getting a first result). Oh... There's a problem here, you say. And you're right. We do not want the system to execute InformFirstProcessing and InformSubsequentProcessing sequentially. Instead, for the first user query, we'd like the system to execute InformFirstProcessing, skip InformSubsequentProcessing, and execute ExecuteBackendCall (which retrieves the results from the database). For subsequent user queries, the system should skip InformFirstProcessing, execute InformSubsequentProcessing and execute ExecuteBackendCall. This is therefore breaking the basic "depth-first left-right" traversal rule. Fortunately, this can be done (and is very common) in RavenClaw using what we call preconditions. Agent preconditions define when a given agent should be executed and when it should be skipped. We'll give more details on them when we talk about the way the task tree is described in RavenClaw. So assuming we have completed the backend call, following our top-down rule, the next agent to be executed is InformSuccess:

 S: There is a 28X leaving DOWNTOWN at 4:20 p.m. It will arrive at THE AIRPORT at 4:56 p.m.

Top-down traversal tells us that the system now executes InformError. Now here again you can guess that you wouldn't want to do both InformSuccess and InformError for a single database query. Either the system was able to retrieve results for the user and we want to tell them to the user (InformSuccess) or for some reason, the system couldn't get an appropriate answer to the query and we want to inform the user of the problem (InformError). How do we do this? Yes you're right! PRECONDITIONS! See you're learning :)

Once either of the two Inform Agents has been executed, we continue our traversal to RequestNextQuery:

 S: You can say, when is the next bus, when is the previous bus, start a new query, or goodbye.
 U: WHEN IS THE NEXT BUS

The next agent to be executed is ExpectGoodbye but as we've seen, Expect Agents do not yield any action so for now we'll just ignore it. We'll get back to it's role later. Then, according to the left-right rule, the system should execute InformStartingNewQuery, and go to the next terminal agent on the right, which GreetGoodbye. However, this is not the behavior we're looking for. Instead, we want to go back and look at the database for the time of the next bus (since that's what the user asked for) and inform the user of the new results. We'll see in a minute how to do this in detail but the key idea is that we want to reopen the ProcessQuery and GiveResults subtasks (i.e. execute them again). What happens then is that, because of the preconditions on InformFirstProcessing and InformSubsequentProcessing, we now skip InformFirstProcessing and execute InformSubsequentProcessing:

 S: Okay.

We then get the new result from the backend, and execute InformSuccess (assuming we did get a valid new result):

 S: There is a 28X leaving DOWNTOWN at 7:03 p.m. It will arrive at THE AIRPORT at 7:37 p.m.

We then skip InformError again because of preconditions and re-execute RequestNextQuery:

 S: You can say, when is the next bus, when is the previous bus, start a new query, or goodbye.
 U: GOODBYE

Now the problem is that, because we do not consider exiting as a next query, it is not understood by RequestNextQuery (note that we could have done things otherwise and make it a next query, but we would have missed an opportunity to introduce something new ;) Instead, "GOOODBYE" is understood (i.e. taken into account)by the ExpectGoodbye attached just next to RequestNextQuery. Accordingly, the GiveResults agency completes its execution (the precondition for InformStartingNewQuery being still false), so does PerformTask (typically, agencies complete their execution once all their children have completed, and have not been reopened). We continue our left-right traversal to execute GreetGoodbye:

 S: Thank you for using MyBus. Goodbye!

And this time we're done with the task.

The RavenClaw Task Specification Language

Okay, you say, this is all a nice story but how do I build my system? How do I describe the task tree and all those preconditions, reopenings, etc? Ideally, we'd like to be able to literally draw the tree and have RavenClaw understand it. Unfortunately, we do not have support for such a GUI yet (feel free to contribute one!!)... Instead, we rely on RCTSL (RavenClaw Task Specification Language), a specific language designed to describe the agents in the tree and their relations. You can find the RCTSL description of the MyBus task in /Agents/MyBusDM/MyBusDialogTask.cpp

WHAT, CPP? Yes you've read it right, it's a C++ file. Actually RCTSL is a set of C++ macros, that, once compiled, generate an executable that will be your dialogue manager (DM). Though this might seem like a strange idea at first, it presents the big advantage of letting you write actual C or C++ code in parts of the task specification, giving the full power of a complete programming language when you need, which inevitably happens once you start building realistically complex systems. That said, you don't need to know much C or C++ to write a simple dialogue system so if C/C++ are not your cup of tea, don't worry about it.

So back to our MyBus task specification file. First you can see the license, and a bunch of initializations that we won't talk about in this tutorial. There's also a "CONCEPT TYPE DEFINITIONS" section, which we'll get back to later. For now, let's move directly to the section called "AGENT SPECIFICATION". The first thing we see (besides comments) is:

 // /MyBus
 DEFINE_AGENCY( CMyBus,   
 
   IS_MAIN_TOPIC()
 
   DEFINE_SUBAGENTS(
     SUBAGENT(GiveIntroduction, CGiveIntroduction, "")
     SUBAGENT(PerformTask, CPerformTask, "")
     SUBAGENT(GreetGoodbye, CGreetGoodbye, "")
   )
 
 )

You can probably guess what this does... It defines the agency called MyBus and indicates who its children are. Note that as a convention, there is a comment line (starting with "//") before each agent definition that indicates the "path" to the agent. You probably noted that the first word after DEFINE_AGENCY is CMyBus instead of MyBus. All the agents have two names, which are identical except for an additional initial "C" letter. The one with the "C" represents the name of the class that is used internally by RavenClaw to manipulate that agent, whereas the one without is the name by which you refer to the agent within the task specification. Since my bus is the root, it's path is simply /MyBus. IS_MAIN_TOPIC indicates that this subtask is a main topic of conversation. Let's not worry about what that means for now. Next is a DEFINE_SUBAGENTS block, which lists the children of the agency. Each child is specified using a SUBAGENT directive whose first parameter is the name of the subagent, followed by the name of the corresponding class (so basically the same name with a C in front of it). The third parameter is always going to be an empty string in this first tutorial so we won't talk about it now. And that is how you define an agency. Simple isn't it?

Let's move on to the next agent, GiveIntroduction:

 // /MyBus/GiveIntroduction
 DEFINE_INFORM_AGENT( CGiveIntroduction,
   PROMPT("inform welcome")
 )

The PROMPT directive indicates what the system should say. More specifically, as was the case for understanding the input of the user, the dialogue manager does not really care which words the system will actually display or speak. What's important here is the meaning that is being conveyed to the user. In this case the fact that we are informing the user (rather than, say asking them a question) that they are welcome to user the system. That is what the string within the PROMPT directive expresses. It is the Natural Language Processing module's job to change this semantic representation into natural language. We'll see in the next section how this is done but for now let's just assume that this is what happens. See also RavenClaw Prompt Description Syntax for a more detailed explanation of the string within the PROMPT directive.

Let's move on to the definition of the second child of MyBus:

 // /MyBus/PerformTask
 DEFINE_AGENCY( CPerformTask,
 
   DEFINE_CONCEPTS(
     INT_USER_CONCEPT(query_type, "")
     STRING_USER_CONCEPT(origin_place, "")
     STRING_USER_CONCEPT(destination_place, "")
 
     CUSTOM_SYSTEM_CONCEPT(result, CResultConcept)
     CUSTOM_SYSTEM_CONCEPT(new_result, CResultConcept)
   )
   
   DEFINE_SUBAGENTS(
     SUBAGENT(GetQuerySpecs, CGetQuerySpecs, "")
     SUBAGENT(ProcessQuery, CProcessQuery, "")
     SUBAGENT(GiveResults, CGiveResults, "")
   )
 
 )

So this is another agency definition. This time, in addition to DEFINE_SUBAGENTS, there's another directive called DEFINE_CONCEPTS. This is the right time to introduce another key issue of RavenClaw: concepts. Basically, concepts are to RCTSL what variables are to standard programming languages like C++, Java, Perl, etc... They store values so that you can retrieve and manipulate later. There are two main categories of concepts. System concepts are really like variables and nothing more; they are typically used to store results retrieved from the database and other internal values. On the other hand, user concepts capture entities provided by the user. For example in our task, the origin and the destination of the bus trip are provided by the user so they will be user concepts. We will see later what the differences between system and user concepts are. So we first define a INT_USER_CONCEPT called "query_type". As you probably guessed, this concept takes integer values that encodes the type of question the user asked (e.g. "next bus", "previous bus"). There is a second argument to INT_USER_CONCEPT which is set to an empty string here. This is related to the confirmation behavior that the system should have with this concept but, as in the SUBAGENT case, we'll keep this for a future tutorial. There are also two STRING_USER_CONCEPT definitions, for the origin and destination between which the user wants to take the bus. Here we use strings to store the explicit place name used by the user. The next two concepts defined are system concepts, but they are not of a special type, which has been defined specifically for this system. Now is the time to look back at that "CONCEPT TYPE SPECIFICATION" section that we mentioned earlier. What we see there is:

 DEFINE_FRAME_CONCEPT_TYPE( CResultConcept,
   ITEMS(
     INT_ITEM(failed)
     STRING_ITEM(route)
     INT_ITEM(departure_time)
     INT_ITEM(arrival_time)
   )
 )

This defines a custom concept type called CResultConcept (again the "C" at the beginning indicates that this is in fact a C++ class, but that's just a naming convention). This concept is a frame concept that contains four atomic elements: an integer called "failed", a string called "route", and two more integers called "departure_time" and "arrival_time". If you're familiar with any modern programming language, that shouldn't be anything new to you. Going back to our PerformTask agent, we define two concepts called "result" and "new_result", both of the type CResultConcept. Now there is much more about concepts than we can touch upon in one tutorial... If you're curious about them, you can find a detailed and exhaustive reference in our Concepts in RavenClaw page.

Back to our code, next is a standard agency definition for GetQuerySpecs, which should not give you any problem. The following definition however is our first Request Agent:

 // /MyBus/PerformTask/GetQuerySpecs/RequestOriginPlace
 DEFINE_REQUEST_AGENT( CRequestOriginPlace,
 
   PROMPT("request origin_place")
   REQUEST_CONCEPT(origin_place)
   GRAMMAR_MAPPING("![Place]")  
 
 )

There are three key directives for Request Agents:

  • PROMPT, as for Inform Agents, describes the content of the question asked by the system when the agent is executed
  • REQUEST_CONCEPT specifies the name of the user concept that the agent should acquire from the user
  • GRAMMAR_MAPPING indicates how to map (or bind) the parse of the user input to the concept

The single argument of the GRAMMAR_MAPPING directive is a string that describes which grammar slots should be used to fill in the Request Agent's concept. You should recognized the bracketed expression "[Place]" as a slot from the grammar (see above). This means that the string of words that matches the [Place] slot should be put in the origin_place string concept. Now what's with the exclamation mark you say? It indicates the scope of the grammar mapping. More specifically, it says that the binding should only be done if RequestOriginPlace is the current topic (i.e. if the system just asked the "request origin_place" question to the user). Using the exclamation mark on grammar mappings forces the user to respond to the current question only (a type of interaction called system initiative). In the next tutorial we'll see example of more flexible bindings that allow the user to take the initiative in the dialogue and volunteer information even when not asked for it. Again, there's much more to grammar mappings than I could dream of explaining here (we'll actually see a little more later). But feel free to refer to our RavenClaw Grammar Mappings and Binding Filters page for more information.

The next agent is very similar to the one we just saw, it's dealing with the destination rather than the origin. Note that, although the concepts are different, the grammar mappings are the same between these two Request Agents ("![Place]"), since both concepts correspond to places. Note that without the exclamation mark, assuming the user typed "downtown", the meaning of this input would be ambiguous between "origin_place=downtown" and "destination_place=downtown". Now, depending on when the input arrived (after RequestOriginPlace or RequestDestinationPlace was executed), the system will know to bind it to one or the other concept.

So now we're done with the GetQuerySpecs agency (take a look back at the task tree if you're lost). Let's move on to ProcessQuery:

 // /MyBus/PerformTask/ProcessQuery
 DEFINE_AGENCY( CProcessQuery,
 	
   DEFINE_SUBAGENTS(
     SUBAGENT(InformFirstProcessing, CInformFirstProcessing, "")
     SUBAGENT(InformSubsequentProcessing, CInformSubsequentProcessing, "")
     SUBAGENT(ExecuteBackendCall, CExecuteBackendCall, "")
   )
 
   SUCCEEDS_WHEN(
     SUCCEEDED(ExecuteBackendCall)
   )
 )

Yet another agency definition, you should be used to them by now. What's special about this one is that SUCCEEDS_WHEN block. All agents in RavenClaw have a success criterion, something that tells whether they have achieved their subtask or not. For agencies, the default success criterion is that all of their children have succeeded. While this worked for the agencies we've seen so far, it won't in this case because, if you remember well, this agency should only execute one of its two first children InformtFirstProcessing and InformSubsequentProcessing. This means that one of them will not have succeeded when we should move on to the next step in the dialogue. To allow this, we define a success criterion with SUCCEEDS_WHEN, whose argument is a boolean expression. When the expression is true the agency will be considered to have succeeded (i.e. achieved its goal). In this case the criterion is that ProcessQuery's third child ExecuteBackendCall has succeeded (regardless of the status of the first two Inform Agents).

The next two agents are standard Inform Agents.

 // /MyBus/PerformTask/ProcessQuery/InformFirstProcessing
 DEFINE_INFORM_AGENT( CInformFirstProcessing,
   PRECONDITION(!AVAILABLE(result))
   PROMPT("inform looking_up_database_first")
 )
 
 // /MyBus/PerformTask/ProcessQuery/InformSubsequentProcessing
 DEFINE_INFORM_AGENT( CInformSubsequentProcessing,
   PRECONDITION(AVAILABLE(result))
   PROMPT("inform looking_up_database_subsequent")
 )

The only thing to notice about them is the PRECONDITION directive, which takes a boolean expression. Only when the expression is true will the DM execute the agents. So, if the "result" concept is empty (AVAILABLE returns true if there is something stored in the concept), the first inform will be executed but not the second one. If we've already got something from the database, the opposite will happen.

Next comes our first Execute Agent:

 // /MyBus/PerformTask/ProcessQuery/ExecuteBackendCall
 DEFINE_EXECUTE_AGENT( CExecuteBackendCall,
   EXECUTE(
     if (!AVAILABLE(query_type)) {
       C("query_type") = NQ_BUS_AFTER_THAT;		
     }
 
     // call on the galaxy stub agent to execute that particular call
     pTrafficManager->Call(this, "gal_be.launch_query <query_type "
                                 "<origin_place <destination_place "
                                 "<result >new_result");
 
     C("result") = C("new_result");
   )
 )

In an Execute Agent, the EXECUTE directive can contain a block of C++ code that can do whatever you want it to do... In this case, we first check if the query_type is defined, and if not we set it to requesting the next bus by default. Then we call the backend using the Call method of RavenClaw's Traffic Manager Agent, which is in charge of communicating with everything outside the DM. The second argument of Call indicates which concepts to send to the backend and where to store the result. For a detailed description of its syntax see the equivalent RCTSL directive CALL. In our case, we send the query type, origin and destination places, as well as any previously obtained result to the backend, and obtain a new result from it. Finally, we copy the new result in the result concept.

Next comes the definition of the GiveResults agency:

 // /MyBus/PerformTask/GiveResults
 DEFINE_AGENCY( CGiveResults,
 
   DEFINE_CONCEPTS(
     INT_USER_CONCEPT( next_query, "")
     BOOL_USER_CONCEPT( goodbye, "")
   )
 
   DEFINE_SUBAGENTS(
     SUBAGENT(InformSuccess, CInformSuccess, "")
     SUBAGENT(InformError, CInformError, "")
     SUBAGENT(RequestNextQuery, CRequestNextQuery, "")
     SUBAGENT(ExpectGoodbye, CExpectGoodbye, "")
     SUBAGENT(InformStartingNewQuery, CInformStartingNewQuery, "")
   )
 
   SUCCEEDS_WHEN(
     ((int)C("next_query") == NQ_BUS_AFTER_THAT) ||
     ((int)C("next_query") == NQ_BUS_BEFORE_THAT) ||
     SUCCEEDED(InformStartingNewQuery) ||
     IS_TRUE(goodbye)
   )
 
   ON_COMPLETION(
     if (((int)C("next_query") == NQ_BUS_AFTER_THAT) ||
         ((int)C("next_query") == NQ_BUS_BEFORE_THAT)) {
       A("..").ReOpenTopic();
       C("query_type") = (int)C("next_query");
       C("next_query").Clear();
     } else if ((int)C("next_query") == NQ_NEW_REQUEST) {
       A("/MyBus/PerformTask").Reset();
     }
   )
 )

Yeah, that's a big one... First it defines two local user concepts "next" query (an integer) and "goodbye" (a boolean). Skipping over the subagents, we get another success criterion. Basically, it says that this agency succeeds when the user requested the next or previous bus, or asked to start a new query and was informed of that the system is starting over, or said "goodbye". Then we have a new block called ON_COMPLETION. Everything that is in this block is C++ code (as in the EXECUTE block we've seen for Execute Agents) that is executed once the agent has finished its task (i.e. when the success criterion is true). If the user asked for the next or previous bus, we reopen the PeformTask agent, which basically means that the system will now re-execute the whole PerformTask subtask again. We access the PeformTask agent by using the A keyword which takes a path to an agent (paths in RavenClaw follow the standard convention that "." means self and ".." means parent, so you can use relative paths as we do here), and gives a reference to the corresponding class. Next we copy the value of next_query into query_type and erase the value of next_query. If the user instead asked to start from scratch, we call Reset on the PeformTask agent which, in addition to reopening the topic as described above, clears the value of all concepts defined in and under PeformTask. "/MyBus/PeformTask" is the absolute path to PerformTask and is completely equivalent to ".." in this case (we used the two variants for illustration purposes only). Here we've seen a few methods that can be called on agents and concepts. Again, there are many many more... See the reference pages for commonly used agent and concept methods.

The next two agents are standard Inform Agent with preconditions. The only novelty here is that they use the "<concept_name" syntax within their PROMPT directives, which means the same thing as for the Call method: it sends the value of concepts to the NLG. Next is the definition of RequestNextQuery:

 // /MyBus/PerformTask/GiveResults/RequestNextQuery
 DEFINE_REQUEST_AGENT( CRequestNextQuery,
   REQUEST_CONCEPT(next_query)
 
   PROMPT("request next_query")
 
   GRAMMAR_MAPPING("![StartOver]>1, "
                   "![NextBus]>2, "
                   "![PreviousBus]>3")
 )

Here you can see that the grammar mapping is different from the ones we've seen so far. First, there are several slots that are mapped, separated by commas. Second, for each slot, the slot name is followed by ">" and a number. This means that instead of storing the string of words captured by the given grammar slot, the value that comes after the ">" is put in the concept when the slot is found anywhere in the parse tree of the user input. Since, if you remember well, next_query is an integer concept, we store integers 1, 2, or 3 in it (which correspond to the macros NQ_NEW_REQUEST, NQ_NEXT_BUS, NQ_PREVIOUS_BUS in MyBusConcepts.h).

Then comes our Expect Agent:

 // /MyBus/PerformTask/GiveResults/ExpectGoodbye
 DEFINE_EXPECT_AGENT( CExpectGoodbye,
   EXPECT_CONCEPT( goodbye)
   GRAMMAR_MAPPING("@(../RequestNextQuery)[Quit]>true")
 )

It looks pretty much like the definition of a Request Agent, except that REQUEST_CONCEPT is replaced by EXPECT_CONCEPT and there is no PROMPT directive. Also, the grammar mapping uses a different kind of scope. Instead of "!", we use "@(<agent_path>)", which indicate that the binding should only happen when the agent(s) specified after the "@" have the focus. Remember that Expect Agent are never executed and therefore never have the focus. Instead, what we want here is to "intercept" an input that would come after RequestNextQuery. Finally, since the "goodbye" concept is a boolean, we use the ">" syntax in the grammar mapping (in the case of grammar mapping for boolean concepts, "true" and "false" are special strings that bind the corresponding boolean value to the concept).

The last two agents of our tree are standard Inform Agents. They are followed by the agent declaration block:

 DECLARE_AGENTS(
   DECLARE_AGENT(CMyBus)
     DECLARE_AGENT(CGiveIntroduction)
     DECLARE_AGENT(CPerformTask)
       DECLARE_AGENT(CGetQuerySpecs)
         DECLARE_AGENT(CRequestOriginPlace)
         DECLARE_AGENT(CRequestDestinationPlace)
       DECLARE_AGENT(CProcessQuery)
         DECLARE_AGENT(CInformFirstProcessing)
         DECLARE_AGENT(CInformSubsequentProcessing)
         DECLARE_AGENT(CExecuteBackendCall)
       DECLARE_AGENT(CGiveResults)
         DECLARE_AGENT(CInformSuccess)
         DECLARE_AGENT(CInformError)
         DECLARE_AGENT(CRequestNextQuery)
         DECLARE_AGENT(CExpectGoodbye)
         DECLARE_AGENT(CInformStartingNewQuery)
       DECLARE_AGENT(CGreetGoodbye)
 )

All agents used in the tree have to be "registered" to RavenClaw using a DECLARE_AGENT directive. And finally, we need to let RavenClaw which agent is the root of the tree:

 DECLARE_DIALOG_TASK_ROOT(MyBus, CMyBus, "")

As you can see, the syntax for DECLARE_DIALOG_TASK_ROOT is identical to that of SUBAGENT.

Writing the NLG templates

Okay I hope you survived the RCTSL crash course and are still reading this tutorial... Don't worry we're almost done! The only thing that we need now is to tell the system how to speak (or, in our case, generate text to display). Remember that the PROMPT directives contained descriptions of the meaning to convey to the user? Something like "inform welcome", "request origin_place" or "inform result <query_type <origin_place <destination_place <result"? Well these strings, along with the value of the associated concepts (as in the latter case), are sent to Rosetta, Olympus' Natural Language Generation module. Rosetta is written in Perl so we won't use Visual Studio for this one (although you can use it as a text editor if you want). The files where the natural language prompts are specified are in Agents/MyBusNLG/Rosetta/MyBus. There is Inform.pm and Request.pm which correspond to the two types of prompts we have seen so far. Let's look into Inform.pm first. If you're familiar with Perl, you should recognize that all there are only two things in this file: the definition of a hash called $Rosetta::MyBus::act{"inform"} (meaning that it's a hash within the "act" hash), and the definition of an auxiliary subroutine called convertTime. Each entry in the hash is defined the following way:

 "<entry_name>" => <content>,

For example, let's look at the first entry:

 "welcome" =>  "Welcome to MyBus.",

Well all that means is that "inform welcome" will be translated into "Welcome to MyBus." for the user. Pretty simple isn't it? The same goes for the following four entries "goodbye", "looking_up_database_first", "looking_up_database_subsequent", and "starting_new_query". Then come the more complicated prompts that provide the results to the query:

 "result" => sub {
 
               my %args = @_;
 
               my $dep_time = &convertTime($args{"result.departure_time"});
               my $arr_time = &convertTime($args{"result.arrival_time"});
               return "There is a <result.route> leaving <origin_place> at <result.departure_time $dep_time>. ".
                      "It will arrive at <destination_place> at <result.arrival_time $arr_time>.";
             },

Here, intead of a string on the right side of the "=>", we have a Perl subroutine. You can do whatever you want in this sub, the only requirement is that it returns a string, which will be the one displayed/said to the user. Within the sub, you can access the concepts that have been passed by the DM as arguments to the sub. That's what this "my %args = @_;" line does. Then you can access the concept values as we do when calling convertTime (remember that "result" is a frame concept that has "departure_time" and "arrival_time" as items). convertTime changes the 4-digit representation of time returned by the backend (e.g. "1315") into a readable form of it (1:15 p.m.). Now comes the return statement with the actual string to display. Besides the standard English words in it, you can see things like "<result.route>". These statement also represent concepts passed by the DM, except that in this case, all we want to do is include their value as-is in the output string. So when Rosetta sees "<result.route>" in a prompt, it knows to replace it by the actual value of the result.route concept, as passed by the DM. Then we have "<result.departure_time $dep_time>". This indicates that, while we're giving the string representation of the concept ($dep_time computed by convertTime above), we still indicate that this strings represents the concept "result.departure_time". It is important to mark the portions of the prompt that express concept values so that Rosetta and the DM are aware of which concepts where conveyed to the user. Although we don't make use of this knowledge in our system here, we might in the future, for example to know how much was conveyed to the user before they interrupted the prompt in a speech input/output system.

Now that we're done with prompts for Inform Agents, we can look into Request.pm. You can see that all prompts in there are of the simplest form "<entry_name> => <output_string>,".

A note on the system's back end

The last module that is needed to get any realistic (even toy) system running is the backend. It provides the actual knowledge to the system (and to the user), and is often (but not always) an interface to a database. This module is, by definition, very dependent on the task your system is designed for. Therefore, we will not go into any details on MyBus' backend. It is written in Perl, and can be found in Agents/MyBusBackend. The main program is TaskBE.pm. You can refer to our page on Plugging a backend module in Olympus.

Conclusion

Congratulations! You've gone through the first Olympus Tutorial! Now you know the basic elements of an Olympus dialogue system:

  • a grammar to understand what the user says
  • a dialogue task specification in the form of a task tree, that encodes the system's behavior
  • some natural language generation templates to produce natural output

Although, it might seem a bit heavy, this should give you the basis to understand most of the code of existing systems and also the possibility to modify them and create your own. Feel free to do just that on the MyBus system. For example, what if you wanted to first ask the user which bus route they want information for? Can you do the required changes to the grammar, the DM and the NLG?

Personal tools