Extracting Multiple regular expressions from Web Page

Hi,

I am currently using c# to implement a project (previously using c++) and had a query on the response.ExtractRegExp() function.

I am having trouble extracting a list of regular expressions, i am only getting one extracted value back, however there are multiple possible matches on the web page.

For example:

string regEx = "PS-[0-9]{8}-[0-9]{5}";
ExtractionCursor cursor = new ExtractionCursor();

RegExpMatchList matchRegEx= response25.ExtractRegExp(cursor,regEx);


When I iterate through the matchRegEx list it only seems to have one value, The code sample in the c# API help suggests that this approach should populate the list with all extracted regular expression matches. Am i doing something wrong?

I could get around this by dumping the web page contents to a string and use the .NET regex functionality but wondered if i could get around this with ExtractRegExp.

Many Thanks,
Paul.

Comments

  • Tom MiseurTom Miseur ForumAdmin, Member admin
    Hi Paul,

    I've had a look and am able to populate multiple matches so it looks like your RegEx may only be returning a single match. Are you able to share the page source so that I can reproduce locally? Feel free to email this to me if you're worried about sharing it on the forum (my username with a dot between and @testplant.com).

    My test was to navigate to http://forums.testplant.com/phpBB2/ and in the generated code plant the following:
                RegExpMatchList matches = response4.ExtractRegExp(new ExtractionCursor(), @"url\(.*\)");
    
                WriteMessage("Number of regex matches: " + matches.Count);
                
                foreach (RegExpMatch match in matches)
                {
                    WriteMessage("Match value: " + match.Match);
                }
    

    The expression used is probably a bit more straightforward than yours, but it simply extracts a bunch of URLs in this case:
    00:00:04:524	Message		Number of regex matches: 5
    00:00:04:525	Message		Match value: url(templates/greenhouse/images/cellpic2.jpg)
    00:00:04:525	Message		Match value: url(templates/greenhouse/images/)
    00:00:04:525	Message		Match value: url(templates/greenhouse/images/cellpic1.gif)
    00:00:04:525	Message		Match value: url("templates/greenhouse/formIE.css")
    00:00:04:525	Message		Match value: url('/phpBB2/images/bg_forum.png')
    

    When you manually put it through the .NET RegEx engine, are you using the Response object's Content property?
Sign In or Register to comment.