flexget.plugins.input.regexp_parse module#
- class flexget.plugins.input.regexp_parse.RegexpParse[source]#
Bases:
objectDesigned to take input from a web resource or a file.
It then parses the text via regexps supplied in the config file.
source: is a file or url to get the data from. You can specify a username:password
sections: Takes a list of dicts that contain regexps to split the data up into sections. The regexps listed here are used by find all so every matching string in the data will be a valid section.
keys: hold the keys that will be set in the entries
- key:
regexps: a list of dicts that hold regexps. The key is set to the first string that matches any of the regexps listed. The regexps are evaluated in the order they are supplied so if a string matches the first regexp none of the others in the list will be used.
required: a boolean that when set to true will only allow entries that contain this key onto the next stage. url and title are always required no matter what you do (part of flexget)
#TODO: consider adding a set field that will allow you to set the field if no regexps match
- #TODO: consider a mode field that allows a growing list for a field instead of just setting to
# first match
Example config
regexp_parse: source: http://username:password@ezrss.it/feed/ encoding: "utf-8" sections: - {regexp: "(?<=<item>).*?(?=</item>)", flags: "DOTALL,IGNORECASE"} keys: title: regexps: - {regexp: '(?<=<title><!\[CDATA\[).*?(?=\]\]></title>)'} #comment url: regexps: - {regexp: "magnet:.*?(?=])"} custom_field: regexps: - {regexp: "custom regexps", flags: "comma separated list of flags (see python regex docs)"} required: False custom_field2: regexps: - {regexp: 'first custom regexps'} - {regexp: 'can't find first regexp so try this one'}
- compile_regexp_dict_list(re_list)[source]#
Turn a list of dicts containing regexps information into a list of compiled regexps.
- on_task_input(**kwargs)#
- FLAG_REGEX = '^(\\s?(DEBUG|I|IGNORECASE|L|LOCALE|M|MULTILINE|S|DOTALL|U|UNICODE|X|VERBOSE)\\s?(,|$))+$'#
- FLAG_VALUES = {'DEBUG': re.DEBUG, 'DOTALL': re.DOTALL, 'I': re.IGNORECASE, 'IGNORECASE': re.IGNORECASE, 'L': re.LOCALE, 'LOCALE': re.LOCALE, 'M': re.MULTILINE, 'MULTILINE': re.MULTILINE, 'S': re.DOTALL, 'U': re.UNICODE, 'UNICODE': re.UNICODE, 'VERBOSE': re.VERBOSE, 'X': re.VERBOSE}#
- schema = {'$defs': {'regex_list': {'items': {'additionalProperties': False, 'properties': {'flags': {'error_pattern': 'Must be a comma separated list of flags. See python regex docs.', 'pattern': '^(\\s?(DEBUG|I|IGNORECASE|L|LOCALE|M|MULTILINE|S|DOTALL|U|UNICODE|X|VERBOSE)\\s?(,|$))+$', 'type': 'string'}, 'regexp': {'format': 'regex', 'type': 'string'}}, 'required': ['regexp'], 'type': 'object'}, 'type': 'array'}}, 'additionalProperties': False, 'properties': {'encoding': {'type': 'string'}, 'keys': {'additionalProperties': {'additionalProperties': False, 'properties': {'regexps': {'$ref': '#/$defs/regex_list'}, 'required': {'type': 'boolean'}}, 'required': ['regexps'], 'type': 'object'}, 'required': ['title', 'url'], 'type': 'object'}, 'sections': {'$ref': '#/$defs/regex_list'}, 'source': {'anyOf': [{'format': 'url', 'type': 'string'}, {'format': 'file', 'type': 'string'}]}}, 'required': ['source', 'keys'], 'type': 'object'}#