I couldn't help but bite on this one. It is a very challenging problem. Here
is your solution:
(?i)(?:(?<function>Write|Read)\s*\()\s*|(?<=(?:(?:Write|Read)\s*\(\s*)|(?:(?:[\d\w]+\s*,\s*)))(?<parameter>[\d\w]+)(?=,\s*|\s*\))
Let me break it down a bit. First, I used (?i) to indicate that it is
non-case-sensitive.
Next, I had the problem of identifying *both* function names and parameters
in the same Regular Expression.
The function name Regular Expression is:
(?:(?<function>Write|Read)\s*\(\s*)
"function" is the name of the capturing group, which captures only the
function name. The rest of the match is to identify it as a function.
It will match only if the function name is "Read" or "Write" and is followed
by an opening parenthesis. I assumed that any token may have any number of
white-space characters before and after it. This was not too tricky.
The second one is a bit trickier:
(?<=(?:(?:Write|Read)\s*\(\s*)|(?:(?:[\d\w]+\s*,\s*)))(?<parameter>[\d\w]+)(?=,\s*|\s*\))
The trick here is to identify a parameter from inside a set of function
parameters.
The rules break down as:
1. A parameter is always preceded by a function name followed by an open
parenthesis, as in:
Write (
2. It may be preceded by another parameter followed by a comma.
Write(param1,
- or -
Write(.......param3,
3. It is always followed by either a comma or an end-parenthesis.
param1,
- or -
param2 )
So, starting with the third rule, we get:
(?<parameter>[\d\w]+)(?=,\s*|\s*\))
"parameter" is the name of the capturing group, which according to these
rules is an alphanumeric token. The rest of it is how the parameter is
matched. It is a positive look-ahead, which means that it *must* be followed
by either a comma or an end parenthesis.
However, the problem here is that *any* word in the string that is not a
function and is followed by a comma or an end parenthesis will match this,
as in:
Read( 0x55, 5 ) <- Write one byte, to (address 0x55)
In this line, "byte," and "(address 0x55)" will match.
So, how do we eliminate non-parameters? Well, obviously, a parameter is
defined as being inside the parentheses of a function call. So, first, use a
positive look-behind to see if it is preceded by a function call. We need to
identify the function, using the same syntax as before:
(?:(?:Write|Read)\s*\(\s*)
However, it may have a parameter before it, instead of the function call. So
we use an OR "|" operator to indicate that it may be preceded by:
(?:(?:[\d\w]+\s*,\s*))
Note that we have changed the rule slightly. Any parameter which precedes
another parameter will *not* be followed by an end-parenthesis. It will
*always* be followed by a comma.
So, we use the Positive Lookbehind syntax (?>=) coupled with an OR operator
("|"), and get:
(?<=(?:(?:Write|Read)\s*\(\s*)|(?:(?:[\d\w]+\s*,\s*)))(?<parameter>[\d\w]+)(?=,\s*|\s*\))
Translated: Match any alphanumeric set of tokens which is followed by either
a comma or an end parenthesis, and is preceded either by a function call or
by another parameter.
Now to put them together, we use the OR operator:
(?i)(?:(?<function>Write|Read)\s*\()\s*|(?<=(?:(?:Write|Read)\s*\(\s*)|(?:(?:[\d\w]+\s*,\s*)))(?<parameter>[\d\w]+)(?=,\s*|\s*\))
The function name will be captured into the "function" group, and all of the
parameters will be captured into the "parameter" group. This could be stated
as:
Match any token that is either "Read" or "Write" followed by an open
parenthesis, and call it "function," OR Match any alphanumeric set of tokens
which is followed by either a comma or an end parenthesis, and is preceded
either by a function call or by another parameter, and call it "parameter."
You sure picked a doozy to start out with!
--
HTH,
Kevin Spencer
Microsoft MVP
Professional Numbskull
Hard work is a medication for which
there is no placebo.
[quoted text, click to view] <LordHog@hotmail.com> wrote in message
news:1144962018.113580.94720@u72g2000cwu.googlegroups.com...
> Hello all,
>
> I am attempting to create a small scripting application to be used
> during testing. I extract the commands from the script file I was going
> to tokenize the each line as one of the requirements is there one
> command per line. I have always wanted to learn Regular Expressions, so
> I was hoping I might do this using Regular Expressions. For a fair
> number of the command will have the syntax like
>
> Write( 0x123, 0x12, 25, 100 ) <- Write three bytes to address 0x123
> Write(varName1, 0x12) <- Write one bytes to address
> expressed by the value of
> varName1
> Read( 0x55, 5 ) <- Write one bytes to address 0x55
> Read(0x3456, 0x12) <- Read eighteen bytes to address
> 0x3456
> varName2 = Read( varName1 ) <- Read one byte from address
> expressed by the value of varName1
> and store that read value to
> varName2
>
>
> I know if I use the regular expression (^[a-zA-Z]*) will find the
> initial keywords or variable names which I can perform an initial check
> to make sure they are valid or the variable has been declared already,
> but the hard part is creating a regular expression to match the various
> forms of the syntax. How would I create a regular express for the first
> and last script commands? I think with those I can attempt to determine
> the others. The spaces between the arguments are optional and may be
> omitted if the user so desires.
>
> For the first script command I was attempting to craft one that looks
> like..
>
> (^[a-zA-Z]*)('\(')(['0x',0-9][a-zA-Z]*)(',')(['0x',0-9][a-zA-Z]*)
>
> but this obviously doesn't work. Any help is greatly appreciated.
>
> Mark
>