Introduction
CodePlex is full of fantastic ideas. One of them is Irony - a framework that allows to create new languages. The scanner, parser, and interpreter are coded in C#, and all one needs is to define the grammar (also in C#) and provide the implementation of new keywords, functions, etc. There are some introductory articles regarding Irony on CodeProject, e.g.: Writing Your First Domain Specific Language, Part 1 of 2, JSBasic - A BASIC to JavaScript Compiler, Writing Your First Visual Studio Language Service, and Irony - .NET Compiler Construction Kit.
This work shows some of the possibilities of Irony. The point is to automate interactions with a WWW server (create some kind of web robot). It seems that the best way to achieve that is through a simple domain specific language.
Grammar
The main points of the grammar necessary to realize basic WWW operations are defined as follows:
Collapse | Copy Code program ::= <stmt>*
stmt ::=
getStmt
| postStmt
| matchStmt
| caseStmt
| gotoStmt
| labelStmt
| assignmentStmt
| expr
getStmt ::= "get" <strArg> "into" <variable> <suite>
postStmt ::= "post" <strArg> "referer" "=" <strArg> <postDataStmt> "into" <variable> <suite>
postDataStmt ::= "postdata" <postDataItemStmt>* "end"
postDataItemStmt ::= <strArg> "=" <strArg>
matchStmt ::= "match" <variable> "using" <matchregex> <suite>
caseStmt ::= "switch" <strArg> ":" <caseStmt>+ [<defaultStmt>] "end"
caseStmt ::= "case" <matchregex> ":" <stmt>+ "end"
defaultStmt ::= "default" <stmt>+
gotoStmt ::= "goto" <identifier>
labelStmt ::= ":" <identifier>
assignmentStmt ::= <variable> "=" <expr>
expr ::= <term> | <unExpr> | <binExpr>
variable ::= "@" <identifier>
suite ::= : <stmt>+ [<suiteError>] "end"
suiteError ::= ":error" <stmt>+
Irony allows for easy transformation from this BNF-like form into C# code, e.g.:
Collapse | Copy Code getStmt.Rule = Symbol("get") + strArg + "into" + variable + suite;
stmt.Rule = assignmentStmt | expr | matchStmt | getStmt |
postStmt | switchStmt | labelStmt | gotoStmt;
program.Rule = MakeStarRule(program, stmt);
The most important grammar elements are the "get
", "post
", and "match
" statements. "get
" and "post
" allow to send a request (GET or POST) to a web server and store the result in a variable, e.g.:
Collapse | Copy Code get "http://www.google.com/" into @variable
log(@variable)
end
log
is one of the functions in the WWW DSL; as the name suggests, it allows to put info into some storage (file, or console). Note that the variables are not explicitly declared. The address can also come from a variable:
Collapse | Copy Code @addr = "http://www.google.com/"
get @addr into @variable
log(@variable)
end
Error handling is also possible:
Collapse | Copy Code @addr = "http://www.google.com/"
get @addr into @variable
log(@variable)
:error
log("Error appear!")
end
Similarly, you can post a request to a server (e.g., logging to Gmail):
Collapse | Copy Code @addr = "https://www.google.com/accounts/ClientLogin"
post @addr
referer = "https://www.google.com/accounts/ClientLogin"
postdata
"accountType"="GOOGLE"
"Email"="account@gmail.com"
"Passwd"="putpasswordhere"
"service"="mail"
"source"="Pol-WWWDSL-1.0"
end
into @variable
log(@variable)
:error
log("Error appear!")
end
After an answer is obtained from the server it shall be processed. The "match
" statement can be used to perform some operations when the response fits a Regular Expression provided, e.g.:
Collapse | Copy Code @addr = "https://www.google.com/accounts/ClientLogin"
post @addr referer = "https://www.google.com/accounts/ClientLogin"
postdata
"accountType"="GOOGLE"
"Email"="account@gmail.com"
"Passwd"="putpasswordhere"
"service"="mail"
"source"="Pol-WWWDSL-1.0"
end
into @variable
log(@variable)
match @variable using
>>>
SID=(?<sid>[^\\s]+)\\s+LSID=(?<lsid>[^\\s]+)\\s+Auth=(?<auth>[^\\s]+)
>>>
log(@sid)
log(@lsid)
log(@auth)
:error
log("Match failed :(")
end
:error
log("Error appear!")
end
Regular Expression definition starts and ends with a triple ">" character. Variables inside the expression can be declared using regex named groups. One small problem with Irony is the usage of "\" in regex - it must be provided as "\\".
When there are multiple choices, a "switch
" statement is better than "match
", e.g.:
Collapse | Copy Code get "http://www.google.pl/search?q=Irony" into @variable
switch @variable
case
>>>
Wikipedia, the free encyclopedia
>>>
log("Wikipedia result")
end
case
>>>
definition | Dictionary.com
>>>
log("Dictionary result, no wikipedia result")
end
default
log("Other results, no wikipedia nor dictionary result")
end
end
The last syntax element is the "goto
" jump. Such "ugly" constructions are very convenient in simple scripts like the one presented here. "goto
" transfers execution to a point in code marked with a label used as the jump destination, e.g.:
Collapse | Copy Code :restart
get "http://www.google.pl/search?q=Irony" into @variable
switch @variable
case
>>>
Wikipedia, the free encyclopedia
>>>
log("Wikipedia result")
goto restart
end
default
log("Other results, no wikipedia result")
end
end
The version of Irony used in this project lacks a "goto" implementation. Therefore, a simple workaround was prepared to provide this functionality.
It works as follows:
The DoEvaluate
method throws a GotoJumpException
exception.
Collapse | Copy Code protected override void DoEvaluate(EvaluationContext context)
{
throw new GotoJumpException(labelNode_);
}
This allows to return from the invocations stack. Then, execution is restarted, but from the point that is appropriate to the jump destination label.
Collapse | Copy Code while (nodes != null)
{
try
{
foreach (var node in nodes)
{
node.Evaluate(evalContext);
}
nodes = null;
}
catch (GotoJumpException e)
{
nodes = m_gotoNodes[e.LabelId.ValueString];
}
}
These execution points are prepared prior to script execution. The syntax tree is trimmed at label points, and the resulting branches are stored in the m_gotoNodes
dictionary with the label names.
Collapse | Copy Code foreach (var labelStmt in labels)
{
var nodes = labelStmt.Parent.ChildNodes.Skip(
labelStmt.Parent.ChildNodes.IndexOf(labelStmt));
var parent = labelStmt.Parent;
while (parent.Parent != null)
{
var upper = parent.Parent;
if ((upper.Term.Name == "stmt+") || (upper.Term.Name == "program"))
{
var upperNextNodes =
upper.ChildNodes.Skip(upper.ChildNodes.IndexOf(parent) + 1);
nodes = nodes.Concat(upperNextNodes);
}
parent = upper;
}
m_gotoNodes.Add(((Token)labelStmt.ChildNodes[0]).ValueString, nodes);
}
Functions
The "Standard library" of WWW DSL consists of four functions.
The "wait
" function allows to wait for a specified number of seconds, e.g.:
Collapse | Copy Code get @addr into @variable
match @variable using
>>>
you have to wait (?<minutes>\\d+) minutes
>>>
wait(int(@minutes)*60)
end
end
The above example uses an additional function "int
" that allows to convert from string to int. Basic numerical manipulations (like multiplication above) are provided by Irony itself.
The last two functions are "log
" which is shown in the previous examples, and "download
", a function that allows downloading files from a WWW server. The following example is the simplest file downloader created with the WWW DSL:
Collapse | Copy Code download(@arg1, @arg2)
In this example, we have two variables (@arg1
and @arg2
) that are equivalent to "args
" in the "main
" function of a program. @arg1
is the first and @arg2
is the second argument passed to the WWW DSL script.
Irony Grammar Explorer
After a script is prepared, it is checked for correctness. Irony has a very nice application to check grammar and scripts, the Irony Grammar Explorer.
Fig.1. Irony Grammar Explorer with sample WWW DSL script
You can load grammar from a DLLl assembly and paste the script to test for correctness. A sample is presented in figure 1.
Example
The project attached to this article is a simple download application. It allows to automate the operations necessary to download files from file sharing sites. The script in rs.wwwdsl is prepared as a recipe for the most known (at least for me) file sharing site.
Fig.2. RS.WWWDSL in action
To invoke download processing, you have to prepare a file with the list of links and pass it to testwwwdsl.exe as the second argument. The first argument is the name of the processing script. Figure 2 shows some processing action.
Remark
The version of Irony used and that is necessary to run this project is the alpha release from Nov. 5 2008.
TODO
In a following article, I will present a WinForms application to automate download from different web sources.