Introduction
CodePlex is full of fantastic ideas. One of them is Irony - a framework that allows to create new languages. The scanner, parser, and interpreter are coded in C#, and all one needs is to define the grammar (also in C#) and provide the implementation of new keywords, functions, etc. There are some introductory articles regarding Irony on CodeProject, e.g.: Writing Your First Domain Specific Language, Part 1 of 2, JSBasic - A BASIC to JavaScript Compiler, Writing Your First Visual Studio Language Service, and Irony - .NET Compiler Construction Kit.
This work shows some of the possibilities of Irony. The point is to automate interactions with a WWW server (create some kind of web robot). It seems that the best way to achieve that is through a simple domain specific language.
Grammar
The main points of the grammar necessary to realize basic WWW operations are defined as follows:
Collapse | Copy Codeprogram ::= <stmt>*
stmt ::=
getStmt
| postStmt
| matchStmt
| caseStmt
| gotoStmt
| labelStmt
| assignmentStmt
| expr
getStmt ::= "get" <strArg> "into" <variable> <suite>
postStmt ::= "post" <strArg> "referer" "=" <strArg> <postDataStmt> "into" <variable> <suite>
postDataStmt ::= "postdata" <postDataItemStmt>* "end"
postDataItemStmt ::= <strArg> "=" <strArg>
matchStmt ::= "match" <variable> "using" <matchregex> <suite>
caseStmt ::= "switch" <strArg> ":" <caseStmt>+ [<defaultStmt>] "end"
caseStmt ::= "case" <matchregex> ":" <stmt>+ "end"
defaultStmt ::= "default" <stmt>+
gotoStmt ::= "goto" <identifier>
labelStmt ::= ":" <identifier>
assignmentStmt ::= <variable> "=" <expr>
expr ::= <term> | <unExpr> | <binExpr>
variable ::= "@" <identifier>
suite ::= : <stmt>+ [<suiteError>] "end"
suiteError ::= ":error" <stmt>+
Irony allows for easy transformation from this BNF-like form into C# code, e.g.:
Collapse | Copy CodegetStmt.Rule = Symbol("get") + strArg + "into" + variable + suite;
stmt.Rule = assignmentStmt | expr | matchStmt | getStmt |
postStmt | switchStmt | labelStmt | gotoStmt;
program.Rule = MakeStarRule(program, stmt);
The most important grammar elements are the "get", "post", and "match" statements. "get" and "post" allow to send a request (GET or POST) to a web server and store the result in a variable, e.g.:
Collapse | Copy Codeget "http://www.google.com/" into @variable
log(@variable)
end
log is one of the functions in the WWW DSL; as the name suggests, it allows to put info into some storage (file, or console). Note that the variables are not explicitly declared. The address can also come from a variable:
Collapse | Copy Code@addr = "http://www.google.com/"
get @addr into @variable
log(@variable)
end
Error handling is also possible:
Collapse | Copy Code@addr = "http://www.google.com/"
get @addr into @variable
log(@variable)
:error
log("Error appear!")
end
Similarly, you can post a request to a server (e.g., logging to Gmail):
Collapse | Copy Code@addr = "https://www.google.com/accounts/ClientLogin"
post @addr
referer = "https://www.google.com/accounts/ClientLogin"
postdata
"accountType"="GOOGLE"
"Email"="account@gmail.com"
"Passwd"="putpasswordhere"
"service"="mail"
"source"="Pol-WWWDSL-1.0"
end
into @variable
log(@variable)
:error
log("Error appear!")
end
After an answer is obtained from the server it shall be processed. The "match" statement can be used to perform some operations when the response fits a Regular Expression provided, e.g.:
Collapse | Copy Code@addr = "https://www.google.com/accounts/ClientLogin"
post @addr referer = "https://www.google.com/accounts/ClientLogin"
postdata
"accountType"="GOOGLE"
"Email"="account@gmail.com"
"Passwd"="putpasswordhere"
"service"="mail"
"source"="Pol-WWWDSL-1.0"
end
into @variable
log(@variable)
match @variable using
>>>
SID=(?<sid>[^\\s]+)\\s+LSID=(?<lsid>[^\\s]+)\\s+Auth=(?<auth>[^\\s]+)
>>>
log(@sid)
log(@lsid)
log(@auth)
:error
log("Match failed :(")
end
:error
log("Error appear!")
end
Regular Expression definition starts and ends with a triple ">" character. Variables inside the expression can be declared using regex named groups. One small problem with Irony is the usage of "\" in regex - it must be provided as "\\".
When there are multiple choices, a "switch" statement is better than "match", e.g.:
Collapse | Copy Codeget "http://www.google.pl/search?q=Irony" into @variable
switch @variable
case
>>>
Wikipedia, the free encyclopedia
>>>
log("Wikipedia result")
end
case
>>>
definition | Dictionary.com
>>>
log("Dictionary result, no wikipedia result")
end
default
log("Other results, no wikipedia nor dictionary result")
end
end
The last syntax element is the "goto" jump. Such "ugly" constructions are very convenient in simple scripts like the one presented here. "goto" transfers execution to a point in code marked with a label used as the jump destination, e.g.:
Collapse | Copy Code:restart
get "http://www.google.pl/search?q=Irony" into @variable
switch @variable
case
>>>
Wikipedia, the free encyclopedia
>>>
log("Wikipedia result")
goto restart
end
default
log("Other results, no wikipedia result")
end
end
The version of Irony used in this project lacks a "goto" implementation. Therefore, a simple workaround was prepared to provide this functionality.
It works as follows:
The DoEvaluate method throws a GotoJumpException exception.
Collapse | Copy Codeprotected override void DoEvaluate(EvaluationContext context)
{
throw new GotoJumpException(labelNode_);
}
This allows to return from the invocations stack. Then, execution is restarted, but from the point that is appropriate to the jump destination label.
Collapse | Copy Codewhile (nodes != null)
{
try
{
foreach (var node in nodes)
{
node.Evaluate(evalContext);
}
nodes = null;
}
catch (GotoJumpException e)
{
nodes = m_gotoNodes[e.LabelId.ValueString];
}
}
These execution points are prepared prior to script execution. The syntax tree is trimmed at label points, and the resulting branches are stored in the m_gotoNodes dictionary with the label names.
Collapse | Copy Codeforeach (var labelStmt in labels)
{
var nodes = labelStmt.Parent.ChildNodes.Skip(
labelStmt.Parent.ChildNodes.IndexOf(labelStmt));
var parent = labelStmt.Parent;
while (parent.Parent != null)
{
var upper = parent.Parent;
if ((upper.Term.Name == "stmt+") || (upper.Term.Name == "program"))
{
var upperNextNodes =
upper.ChildNodes.Skip(upper.ChildNodes.IndexOf(parent) + 1);
nodes = nodes.Concat(upperNextNodes);
}
parent = upper;
}
m_gotoNodes.Add(((Token)labelStmt.ChildNodes[0]).ValueString, nodes);
}
Functions
The "Standard library" of WWW DSL consists of four functions.
The "wait" function allows to wait for a specified number of seconds, e.g.:
Collapse | Copy Codeget @addr into @variable
match @variable using
>>>
you have to wait (?<minutes>\\d+) minutes
>>>
wait(int(@minutes)*60)
end
end
The above example uses an additional function "int" that allows to convert from string to int. Basic numerical manipulations (like multiplication above) are provided by Irony itself.
The last two functions are "log" which is shown in the previous examples, and "download", a function that allows downloading files from a WWW server. The following example is the simplest file downloader created with the WWW DSL:
Collapse | Copy Codedownload(@arg1, @arg2)
In this example, we have two variables (@arg1 and @arg2) that are equivalent to "args" in the "main" function of a program. @arg1 is the first and @arg2 is the second argument passed to the WWW DSL script.
Irony Grammar Explorer
After a script is prepared, it is checked for correctness. Irony has a very nice application to check grammar and scripts, the Irony Grammar Explorer.

Fig.1. Irony Grammar Explorer with sample WWW DSL script
You can load grammar from a DLLl assembly and paste the script to test for correctness. A sample is presented in figure 1.
Example
The project attached to this article is a simple download application. It allows to automate the operations necessary to download files from file sharing sites. The script in rs.wwwdsl is prepared as a recipe for the most known (at least for me) file sharing site.

Fig.2. RS.WWWDSL in action
To invoke download processing, you have to prepare a file with the list of links and pass it to testwwwdsl.exe as the second argument. The first argument is the name of the processing script. Figure 2 shows some processing action.
Remark
The version of Irony used and that is necessary to run this project is the alpha release from Nov. 5 2008.
TODO
In a following article, I will present a WinForms application to automate download from different web sources.