.

email grabber in c#

code:

[TestFixture]
public class EmailGrabberTests
{
    [Test]
    public void Testx()
    {
        var text = @"sadf sdf at piet.c@gmail.cmo ens
and sno_w@wer.com";
        var emails = EmailGrabber.Grab(text);
        Assert.AreEqual("piet.c@gmail.cmo", emails[0]);
        Assert.AreEqual("sno_w@wer.com", emails[1]);
    }

}

public  class EmailGrabber
{
    /// <summary>
    /// grab emails from text
    /// </summary>
    /// <param name="text"></param>
    /// <returns></returns>
    public static string[] Grab(string text)
    {
        const string LocalLinkRegex =
@"(?<email>\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b)";
        var R2 = new Regex(LocalLinkRegex, RegexOptions.IgnoreCase);
        var Matches2 = R2.Matches(text);

        var enumerable = new List<string>();
        foreach (Match m in Matches2)
        {
            enumerable.Add(m.Groups["email"].Value);
        }
        return enumerable.ToArray();
    }
}

RegEx.Replace multiple tokens c#

This is just  a quick tutorial of how to use RegEx.Replace to replace multiple tokens in C#.

Example of what we want to accomplish:

Transform some text e.g. “{exec:DoThis:param=32}” into “DoThis(32)” using regular expressions

public void RegEx-Replace-using-tokens()
{
    string input = "{exec:DoThis:param=32}";
    var output_txt = Regex.Replace(input, 
         "{exec:(?<p1>\\S+):param=(?<p2>\\d+)}", "$1($2)");
}

output_txt will be: “DoThis(32)”

p1 and p2 could have been named anything (as long as they are different – ie not both “p1″)

The token $1 is basically grabbed from the ………… below:
?(<my_parameter_name_exposed_as_$1> ……….. )                     this will result in $1  =  ……………
and the value can also be accessed as follow:
string functionName = matches[i].Groups["my_parameter_name_exposed_as_$1"].Value;

so e.g.    ?(<anthing>Hi)  will result in $1 to be “Hi”

So if you e.g. want to grab the “ell” in Hello as $1, you can use the following regular expression:

“H(?<param>ell)o”

 

 

C# Regular Expression links

Here are some links for c# regular expressions:
http://oreilly.com/windows/archive/csharp-regular-expressions.html

http://www.mikesdotnetting.com/Article/46/CSharp-Regular-Expressions-Cheat-Sheet

and a good c# regular expression tester:
TESTER

http://www.regexr.com/

simple c# regular expression example:

const string LocalLinkRegex = 
 @"href=""\/livechess\/game\.html\?id\=(?<GameId>[0-9]*)""";
var R2 = new Regex(LocalLinkRegex, RegexOptions.ExplicitCapture);
var Matches2 = R2.Matches(result);

string href;

 string FirstGameIdValue = Matches2[0].Groups["GameId"].Value;

/*
this will find:  /livechess/game.html?id=123456789
and return: 123456789  (which is token: GameId)
*/

or simply:

string title = Regex.Match(source,
    @"\<title\b[^>]*\>\s*(?<Title>[\s\S]*?)\</title\>",
    RegexOptions.IgnoreCase).Groups["Title"].Value;


Email regex:

\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b

as per

http://www.regular-expressions.info/email.html

 

 

regular expressions in c#

Characters

identifier

definition

\a

Alert, x07.

\b

Backspace, x08.

\e

ESC, x1B.

\n

Newline, x0A.

\r

Carriage return, x0D.

\f

Form feed, x0C.

\t

Tab, x09.

\v

Vertical tab, x0B.

\0octal

Two-digit octal character code.

\xhex

Two-digit hexadecimal character code.

\uhex

Four-digit hexadecimal character code.

\cchar

Named control character.

Classes

identifier

definition

[...]

Single character from the listed range.

[^...]

Single character not from the listed range.

.

Any single character except for a line terminator (unless in single-line mode – s).

\w

Word character such as [a-zA-Z_0-9]

\W

Non-word character - basically [^a-zA-Z_0-9]

\d

Digit [0-9]

\D

Non-digit [^0-9]

\s

Whitespace character such as [ \f\n\r\t\v]

\S

Non-whitespace character

\p{prop}

Character that is contained in the specified Unicode block / property.

\P{prop}

Character that is not contained in the specified Unicode block / property.

Tests etc

identifier

definition

^

Start of a string, or following a newline in MULTILINE mode.

\A

Beginning of string, in all match modes.

$

End of string or before any newline if in MULTILINE mode.

\Z

End of string but before any final line terminator, in all match modes.

\z

End of string in all match modes.

\b

Boundary between a \w character and a \W character.

\B

Not-word-boundary.

\G

End of the previous match.

(?=...)

Positive lookahead.

(?!...)

Negative lookahead.

(?<=...)

Positive lookbehind.

(?<!...)

Negative lookbehind.

mode modifiers

mode

identifier

definition

Singleline

s

Dot (.) matches any character, including a line terminator.

Multiline

m

^ and $ match next to embedded line terminators.

IgnorePatternWhitespace

x

Ignore whitespace and allow embedded comments starting with #.

IgnoreCase

i

Case-insensitive match based on characters in the current culture.

CultureInvariant

i

Culture-insensitive match.

ExplicitCapture

n

Allow named capture groups, but treat parentheses as non-capturing groups.

Compiled

 

Compile regular expression.

RightToLeft

 

Search from right to left, starting to the left of the start position.

ECMAScript

 

Enables ECMAScript compliance when used with IgnoreCase or Multiline.

(?imnsx-imnsx)

 

Turn match flags on or off for rest of pattern.

(?imnsx-imnsx:...)

 

Turn match flags on or off for the rest of the subexpression.

(?#...)

 

Treat substring as a comment.

#...

 

Treat rest of line as a comment in /x mode.

groups and repititions

identifier

definition

(...)

Grouping. Submatches fill \1,\2,… and $1, $2,….

\n

In a regular expression, match what was matched by the nth earlier submatch.

$n

In a replacement string, contains the nth earlier submatch.

(?<name>)

Captures matched substring into group, name.

(?:...)

Grouping-only parentheses, no capturing.

(?>...)

Disallow backtracking for subpattern.

...|...

Alternation; match one or the other.

*

Repeated 0 or more times.

+

Repeated 1 or more times.

?

Repeated 1 or 0 times.

{n}

Repeated exactly n times.

{n,}

Repeated at least n times.

{x,y}

Repeated at least x times, but no more than y times.

*?

Repeated 0 or more times, but as few times as possible.

+?

Repeated 1 or more times, but as few times as possible.

??

Repeated 0 or 1 times, but as few times as possible.

{n,}?

Repeated at least n times, but as few times as possible.

{x,y}?

Repeated at least x times, no more than y times, but as few times as possible.

Replacements

identifier

definition

$1, $2, ...

The captured submatches.

${name}

The matched text for a named capture group.

$

Text before match.

$&

Text of match.

$

Text after match.

$+

Last parenthesized match.

$_

Original input string.