Groups | Blog | Home
all groups > c# > april 2007 >

c# : problem with multiline regular expression



Grzegorz Danowski
4/28/2007 7:25:31 PM
Hi,

I'd like to read all lines of caption text from a string:
....
Name ="Paragraph"
Caption ="The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. "
"The quick brown fox jumps over the lazy dog. The
quick brown fox jumps over the lazy dog. The qu"
"ick brown fox jumps over the lazy dog. The quick
brown fox jumps over the lazy dog."
FontName ="Arial CE"
....

First I used simple expression:
(?<=Caption\s=)".*?"
It worked wrongly because it gave only for first line of caption string. I
thought I should check if prefix is not present and I simply modified my
regex to:
(?<=Caption\s=)".*?"(?<!\r\n\s*)

But the expression give the same result as previous - only first line of
caption string.
Meantime I tested another expression:
dog.\s"\r\n\s*".*"
And It worked as I expected (gave one last word from first caption line and
then whole second line). What is proper way to solve my problem?
--
Regards,
Grzegorz

Ps. I tested all my expressions using Expresso 3.0 with checked options:
"Ignore Case", "Ignore White", "Multiline".
Grzegorz Danowski
4/29/2007 3:19:38 PM
[quoted text, click to view]

Better formated example:
Name ="Paragraph"
Caption ="The quick brown fox jumps over the lazy dog. "
"The quick brown fox jumps over the lazy dog. The qui"
"ck brown fox jumps over the lazy dog."
FontName ="Arial CE"

And question: what regex should I use to get out all Catpion text (in the
example three sentences "The quick brown fox...")?
I have tried it: (?<=Caption\s=)".*?"(?<!\r\n\s*)

Regards,
Grzegorz
Kevin Spencer
4/30/2007 1:06:31 PM
(?i)(?<=caption\s*=\s*)"[^"]+

First, I made the regular expression case-insensitive. Then I added a "Zero
or more" space expression both before and after the equals sign. Then comes
the quotation mark, followed by a character class indicating 1 or more
characters that are NOT a quotation mark. This captures everything between
the quotation marks following "caption=" with or without spaces around the
equals sign.

--
HTH,

Kevin Spencer
Microsoft MVP

Printing Components, Email Components,
FTP Client Classes, Enhanced Data Controls, much more.
DSI PrintManager, Miradyne Component Libraries:
http://www.miradyne.net

[quoted text, click to view]

Grzegorz Danowski
4/30/2007 10:38:01 PM
Thanks for your help, but it only parse first line of caption string...
--
Regards,
Grzegorz

[quoted text, click to view]
Grzegorz Danowski
5/1/2007 2:35:21 PM
[quoted text, click to view]

I have written another expression:

(?<=Caption\s=)(?:\s*(?:".*"))+

And I believe that it works :-)
--
Regards,
Grzegorz
Jesse Houwing
5/2/2007 10:58:45 PM
* Grzegorz Danowski wrote, On 28-4-2007 19:25:
[quoted text, click to view]

There's an option you can pass to the RegEx object called SingleLine.
This will cause the parser to treat . as any character. The default is
any character except newline.

AddThis Social Bookmark Button