regular expressions (re module) to fix broken scripts
I will state up front I am no expert in regular expressions, but by a lot of pounding and testing, I've gotten some really useful results for painlessly helping out with broken scripts (skip tutorials and go to recipies...).
For a primer on regular expressions go here.
A super frustrating problem has to do with phantom layers showing up in your scripts, like "redguard1". We've has a lot of other mystery layers show up, and they end up messing up scripts big time. Gizmos often end up as the Typhoid Mary's of junk layers, secretly infecting entire shows. You can use these tools to clean up any text, so running a pass over all your gizmos can find and fix a lot of mysteries.
Other issues that kill scripts are broken clones, and mystery values that are populated with {} and need to be cleaned up and out of a script.
Regular expressions can find all of these problems and fix it without a lot of extra work.
Obviously, the best place to start is the python documentation (http://docs.python.org/library/re.html). Since regex is very similar across a lot of languages, it's also useful to just search for regex or 'regular expressions'.
This is also a pretty fantastic site to test your regex: http://pythonregex.com/ I wish I had known about it before I started all this!
There are 3 topics to I'll cover, which have fixed pretty much almost every broken script we've had:
Starting with the basics, create your imports and load a script with junk in it.
import re
fIn = '/path/to/nukescript.nk'
nukefile = open(fIn, 'r')
nukeTxt = ''.join(nukefile)
nukefile.close()
results = '' # save this for later so you can easily print what you do
Bad Layers:
There is a lot of complaining about redguard1, but we've seen a lot of others. So let's start by defining a list of all the bad layers you've ever encountered (or will encounter) that we want to remove. If the layer in the BADLAYERS list is not in your script, nothing happens, so there are no worries about creating another problem.
BADLAYERS = [ """add_layer {depth depth.cc depth.ZNorm depth.Zselect}""",
"""add_layer {rgba rgba.water redguard1.glow}"""]
We want the regular expressions to find all cases of these layers, strip them, and clean up any shuffle nodes that might be stuck with the bad layers.
for badLayer in BADLAYERS:
badLayerRe = re.compile(badLayer, re.M)
s = badLayerRe.search(nukeTxt)
I prefer to re.compile(expression) since it makes the code following it easier to read and re-use the expression later if needed.
From the python docs:
- re.compile(pattern, flags=0)¶
-
Compile a regular expression pattern into a regular expression object, which can be used for matching using its match() and search() methods, described below.
The expression’s behaviour can be modified by specifying a flags value. Values can be any of the following variables, combined using bitwise OR (the | operator).
The sequence
prog = re.compile(pattern)
result = prog.match(string)is equivalent to
result = re.match(pattern, string)
but using re.compile() and saving the resulting regular expression object for reuse is more efficient when the expression will be used several times in a single program.
The "re.M" in the compile function allows the regular expressions to work across line returns, so it can search the entire text. Otherwise it will only search until the first line return, which won't help a whole lot.
If badLayerRe.search(nukeTxt) doesn't find any bad layers, you are done. If it does, 's' will not be None and you have some work to do.
It's important to note that the re.search will find the compiled expression anywhere in the string, versus re.match that only finds it at the beginning of a string.
if s:
nukeTxt = badLayerRe.sub('', nukeTxt)
results += 'Removed layer: %s\n' % badLayer
So, assuming you found a bad layer, get rid of it. The above line will replace any instance of the bad layer you defined with ''. Poof. A lot easier than scanning a text file!
But that's not all you need to do. If any of those channels are in any nodes and we've now removed the layer, your script is still buggered. We need to get the channels too.
if s:
nukeTxt = badLayerRe.sub('', nukeTxt)
results += 'Removed layer: %s\n' % badLayer
elems = re.search('(?<={).*(?=})', badLayer, re.S).group()
Do a little juju to the badLayer text. From this
add_layer {depth depth.cc depth.ZNorm depth.Zselect}
we want this:
depth depth.cc depth.ZNorm depth.Zselect
To get that, use a regular expression:
'(?<={).*(?=})'
We don't want the { } , so we need to have an expression that finds everything between them, but doesn't include them. Sure, you could do something like this:
'{.*}'
but is wouldn't be as much fun and you'd have to strip the braces off later. Breaking the expression apart:
(?<={)
This part tells regex to find the part of the string that matches after this. This is a positive lookbehind assertion. The expression works up to the { and then delivers everything after it.
.*
* is a wildcard, and '.'m in combination with the 're.S' flag, tells regex to find everything including newlines (actually, not so important here since we're just finding a known string). We are asking it to give us any character ( with '.' ) and to keep giving us characters forever. But this means we need a way to stop the expression.
(?=})
This is a lookahead assertion. It matches only if the stuff following '?=' is following the string to match. So in our case, it's looking for } and stops the match there, not including the }.Put them all together and you get just the stuff between the { }.
elems = re.search('(?<={).*(?=})', badLayer, re.S).group()
re.search( findThis, fromThisText).group() will return the match. If there was no match, this will throw an exception, since '.group()' requires a match object. In our case, we know it will work since we are feeding it text that will always match. This text: "add_layer {depth depth.cc depth.ZNorm depth.Zselect}" finds "depth depth.cc depth.ZNorm depth.Zselect" so we just need to turn that into a proper python list.
if s:
nukeTxt = badLayerRe.sub('', nukeTxt)
results += 'Removed layer: %s\n' % badLayer
elems = re.search('(?<={).*(?=})', badLayer, re.S).group()
elems = elems.split()
elems = [elem for elem in elems if '.' in elem]
often, the first element in that list is the layer name. Don't need it so rebuild the elems as a proper list so the your new elems are just channels ( they are layername.channel, so look for the '.' in the elem)
for elem in elems:
removeMe = re.compile('-?'+elem)
found = removeMe.findall(nukeTxt)
if found:
results += 'Removing: %s\n' % removeMe.findall(nukeTxt)
nukeTxt = removeMe.sub('', nukeTxt)
Use another regular expression to find all the instances of each channel messing up shuffle nodes, and again replace with ''.
re.findall(txt) will do just that: find all instances of the text found with the regular expression. This expression:
'-?'+elem
In this expression, the '?' is like the '*' but lookiing for 0 or 1 instance of '-', followed by the channel (elem). In some cases, the channel name starts with '-' , so lets capture every instance of '-elem' and 'elem'. So it finds both:
-redguard1.glow
and
redguard1.glow
Since we are using re.findall, you can instantly check and see how many times it found the offending channels. All the same, cleanup with re.sub.
Putting it all together:
1 |
import re |
That's all it takes!
Bad Values:
same deal - use findall to let you know that what you are doing actually did something, but use re.sub to replace the bad braces with an empty string, "", or essentially nothing.
since you don't want to willy-nilly remove all empty braces (they might be there for a purpose!), find empty braces with a word and a space in front of it.
'[\w]* \{\}'
using square brackets tells regex to find a certain class of characters. In this case, \w indicates any alphanumeric character and the underscore. '*' is again greedy, so match as many characters as it can find, then stop when it hits a character not in that group - like a space. Then we find a ' ', then {}. I've escaped the braces so regex sees them as non-special characters. This expression will find anything like this:
any_Text1 {}
but not this, because the stuff instide the braces stops the regex:
any_Text1 {2.2}
The sub will remove the entire line, because it found the entire line. I'm assuming that removing the line will allow nuke to replace the knob value with whatever it's default is. This is safe because in the cases where an empty brace should NOT be empty, nuke will simply re-add the knob values to the defaults.
1 |
# Cleanup bad knob expression |
Broken Clones:
This is what a clone will look like in a nuke script:
clone $C242733e0
Use regex to find all the clones in your script:
CLONE = re.compile(r'clone \$(C[0-9A-Za-z]+)')
You are trying to find everything that looks like "clone $C", then any character from 0-9, A-Z, or a-z. Like '*' and '?', the '+' tells the regex to look for a certain number of characters that match. The '+' means find at least 1 character and keep going. In fact, the '+' could also be replaced with {8} to specify only match up to 8 chars then stop, or {8,8} to match at least 8 chars and only 8 characters. A clone will only have 8 chars, so all of those work.
Notice the parenthesis: These define the group. In the case of the above, if we do
CLONE.findall('blabla\n clone $C242733e0 {\nbla bla')
we will get just the part in the parenthesis:
['C242733e0']
If this doesn't have a match existing in your list of sets, you have a broken (orphaned) clone. Replace the broken clone(s) and the first brace with a NoOp and a brace. Name the NoOp something you can find when you are able to open your script again. Replacing the clone this way makes the knobs appear as user knobs in the NoOp.
1 |
CLONE = re.compile(r'clone \$(C[0-9A-Za-z]+)') |
Finally:
To make these three different cleanups a litte more general purpose, let's put it all together and wrap them into defs:
1 |
#!/usr/bin/env python |
This is pure python, not nuke, so it can be run via any type of python script, or even within nuke.
Here is a broken script to play with. Save it as a file and launch it with nuke. A couple unwanted layers, some broken clones and an empty brace. It will stop loading nodes after the first clone breaks.
Run it through the script above to see it come back alive.
broken script... | |
1 |
define_window_layout_xml { |
Enjoy, and happy fixes and cleanups.
JRAB
Comments
here a list for layers that are reported to propagate and cause problems:
rgba.beta
rgba.water
rgba.UVdistort
rgba.warper
alpha.G_matte
horizon.matte
redguard1.glow
rga.alpha
The one place I've noticed that this doesn't work is when a node has one of said channels specified in the alpha slot but the alpha slot isn't turned on for that node. This instance shows up in the .nk script as "-rgb.red". If you remove the "add_layer .." from the .nk script and don't remove all references to that channel, Nuke will not load anything beyond that line in the .nk script and you'll have an incomplete script in the DAG.
http://www.nukepedia.com/python/regular-expressions-re-module-to-fix-broken-scripts/#findChannels covers using the blacklisted layers and finding the channels inside them. Since we use '-?' (find a match 0 or 1 times) as a prefix to the regex expression for the channel we're looking for (based on the bad layer's channels), we find both 'rgb.red' and '-rgb.red'.
A whitelist of channels to keep is a good idea, but so far hasn't been necessary. I guess in our case, when a bad channel has been introduced, it hasn't been carrying a good channel with it in the add_layers function.
RSS feed for comments to this post