史上最全正则表达式教程+免费文档工具+图文解析 Regular expression tutorial

史上最全正则表达式教程+免费文档工具+图文解析 Regular expression tutorial

HashFlare

正则表达式 Regex
个人感觉正则表达才是计算机高级语言,超越了人眼的匹配,直追人脑的匹配能力。
我们经常与正则表达打交道,多数情况下我们都是用基本的匹配,比如简单的javascript验证手机\email有效性等,但是遇到复杂的需求,每次都是苦战。

最近又完善了一个小工具,一个小小的爬虫,爬出来的文章里面经常有各种广告,以及分享代码,去除起来比较麻烦,比如

,去除起来如果是用html parser的匹配有时候又费力又慢,正则用起来很犀利,可是写出来很难,所以建议先用好搜索引擎,用适当的关键词获取类似的例子,改造一番。然后茶余饭后再阅读本文做进一步的学习理解。

免费书籍:

Regular Expressions Google Analytics.pdf
Regular Expressions Cookbook.pdf
regular expression pocket reference second edition.pdf
Regular Expressions the complete tutorial.pdf

书中代码请加群329586454群文件下载

免费工具:

在线工具:

专业工具 https://regex101.com/
站长简单工具 http://tool.chinaz.com/regex/ http://tool.oschina.net/regex

免费软件:正则表达式工具绿色版
收费软件: http://www.regular-expressions.info/tutorial.html

图文并茂讲解:

1. Matching a Username 匹配用户名
username
Pattern:
/^[a-z0-9_-]{3,16}$/
Description:
We begin by telling the parser to find the beginning of the string (^), followed by any lowercase letter (a-z), number (0-9), an underscore, or a hyphen. Next, {3,16} makes sure that are at least 3 of those characters, but no more than 16. Finally, we want the end of the string ($).
扩展思考:想下中文用户名怎么写?

2. Matching a Password 匹配密码
password
Pattern:/^[a-z0-9_-]{6,18}$/
Description:
Matching a password is very similar to matching a username. The only difference is that instead of 3 to 16 letters, numbers, underscores, or hyphens, we want 6 to 18 of them ({6,18}).
扩展思考:如何检测密码强度(是否有大写结合+特殊字符+数字)

3.Matching a Hex Value 匹配十六进制
hex-copy
Pattern:/^#?([a-f0-9]{6}|[a-f0-9]{3})$/
Description:
We begin by telling the parser to find the beginning of the string (^). Next, a number sign is optional because it is followed a question mark. The question mark tells the parser that the preceding character — in this case a number sign — is optional, but to be “greedy” and capture it if it’s there. Next, inside the first group (first group of parentheses), we can have two different situations. The first is any lowercase letter between a and f or a number six times. The vertical bar tells us that we can also have three lowercase letters between a and f or numbers instead. Finally, we want the end of the string ($).

The reason that I put the six character before is that parser will capture a hex value like #ffffff. If I had reversed it so that the three characters came first, the parser would only pick up #fff and not the other three f’s.

4.Matching a Slug 匹配短标签
slug
Pattern:/^[a-z0-9-]+$/
Description:
You will be using this regex if you ever have to work with mod_rewrite and pretty URL’s. We begin by telling the parser to find the beginning of the string (^), followed by one or more (the plus sign) letters, numbers, or hyphens. Finally, we want the end of the string ($).

5.Matching an Email 匹配邮箱
email
Pattern:/^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/
Description:
We begin by telling the parser to find the beginning of the string (^). Inside the first group, we match one or more lowercase letters, numbers, underscores, dots, or hyphens. I have escaped the dot because a non-escaped dot means any character. Directly after that, there must be an at sign. Next is the domain name which must be: one or more lowercase letters, numbers, underscores, dots, or hyphens. Then another (escaped) dot, with the extension being two to six letters or dots. I have 2 to 6 because of the country specific TLD’s (.ny.us or .co.uk). Finally, we want the end of the string ($).

6. Matching a URL 匹配URL
url
Pattern:/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/
Description:
This regex is almost like taking the ending part of the above regex, slapping it between “http://” and some file structure at the end. It sounds a lot simpler than it really is. To start off, we search for the beginning of the line with the caret.

The first capturing group is all option. It allows the URL to begin with “http://”, “https://”, or neither of them. I have a question mark after the s to allow URL’s that have http or https. In order to make this entire group optional, I just added a question mark to the end of it.

Next is the domain name: one or more numbers, letters, dots, or hypens followed by another dot then two to six letters or dots. The following section is the optional files and directories. Inside the group, we want to match any number of forward slashes, letters, numbers, underscores, spaces, dots, or hyphens. Then we say that this group can be matched as many times as we want. Pretty much this allows multiple directories to be matched along with a file at the end. I have used the star instead of the question mark because the star says zero or more, not zero or one. If a question mark was to be used there, only one file/directory would be able to be matched.

Then a trailing slash is matched, but it can be optional. Finally we end with the end of the line.

7. Matching an IP Address 匹配IP地址
ip
Pattern:/^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$/
Description:
Now, I’m not going to lie, I didn’t write this regex; I got it from here. Now, that doesn’t mean that I can’t rip it apart character for character.

The first capture group really isn’t a captured group because(?:(?:was placed inside which tells the parser to not capture this group (more on this in the last regex). We also want this non-captured group to be repeated three times — the {3} at the end of the group. This group contains another group, a subgroup, and a literal dot. The parser looks for a match in the subgroup then a dot to move on.

The subgroup is also another non-capture group. It’s just a bunch of character sets (things inside brackets): the string “25” followed by a number between 0 and 5; or the string “2” and a number between 0 and 4 and any number; or an optional zero or one followed by two numbers, with the second being optional.

After we match three of those, it’s onto the next non-capturing group. This one wants: the string “25” followed by a number between 0 and 5; or the string “2” with a number between 0 and 4 and another number at the end; or an optional zero or one followed by two numbers, with the second being optional.

We end this confusing regex with the end of the string.

8.Matching an HTML Tag 匹配HTML标签

htmltag
Pattern:/^<([a-z]+)([^<]+)*(?:>(.*)<\/\1>|\s+\/>)$/
Description:
One of the more useful regexes on the list. It matches any HTML tag with the content inside. As usually, we begin with the start of the line.

First comes the tag’s name. It must be one or more letters long. This is the first capture group, it comes in handy when we have to grab the closing tag. The next thing are the tag’s attributes. This is any character but a greater than sign (>). Since this is optional, but I want to match more than one character, the star is used. The plus sign makes up the attribute and value, and the star says as many attributes as you want.

Next comes the third non-capture group. Inside, it will contain either a greater than sign, some content, and a closing tag; or some spaces, a forward slash, and a greater than sign. The first option looks for a greater than sign followed by any number of characters, and the closing tag. \1 is used which represents the content that was captured in the first capturing group. In this case it was the tag’s name. Now, if that couldn’t be matched we want to look for a self closing tag (like an img, br, or hr tag). This needs to have one or more spaces followed by “/>”.

The regex is ended with the end of the line.

扩展思考:如何匹配带有attribute的html标签,以及如何匹配多层嵌套的html标签?

快速入门:

正则表达式30分钟入门教程 http://www.jb51.net/tools/zhengze.html
Essential Guide To Regular Expressions: Tools and Tutorials http://www.smashingmagazine.com/2009/06/01/essential-guide-to-regular-expressions-tools-tutorials-and-resources/

不同语言下的正则表达式

1.JavaScript
RegExp Reference http://www.w3schools.com/jsref/jsref_obj_regexp.asp

2.MSDN
Regular Expression Language – Quick Reference http://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx
Best Practices for Regular Expressions in the .NET Frameworkhttp://msdn.microsoft.com/en-us/library/gg578045(v=vs.110).aspx

3.python
Regular Expression HOWTO https://docs.python.org/2/howto/regex.html
A collection of useful regular expressions http://nbviewer.ipython.org/github/rasbt/python_reference/blob/master/tutorials/useful_regex.ipynb

4.java
http://www.mkyong.com/regular-expressions/how-to-extract-html-links-with-regular-expression/
http://www.mkyong.com/regular-expressions/10-java-regular-expression-examples-you-should-know/

常用正则表达式 useful regular expression

10+ Useful JavaScript Regular Expression Functions to improve your web applications efficiency
http://ntt.cc/2008/05/10/over-10-useful-javascript-regular-expression-functions-to-improve-your-web-applications-efficiency.html
里面有信用卡的验证 http://www.virtuosimedia.com/dev/php/37-tested-php-perl-and-javascript-regular-expressions

js常用正则表达式表单验证代码

文章内图文来源:code.tutsplus.com

友荐云推荐