regular expressions (re module) to fix broken scripts

Written by jrab on .

I will state up front I am no expert in regular expressions, but by a lot of pounding and testing, I've gotten some really useful results for painlessly helping out with broken scripts (skip tutorials and go to recipies...).

A super frustrating problem has to do with phantom layers showing up in your scripts, like "redguard1". We've has a lot of other mystery layers show up, and they end up messing up scripts big time. Gizmos often end up as the Typhoid Mary's of junk layers, secretly infecting entire shows. You can use these tools to clean up any text, so running a pass over all your gizmos can find and fix a lot of mysteries.

Other issues that kill scripts are broken clones, and mystery values that are populated with {} and need to be cleaned up and out of a script.

Regular expressions can find all of these problems and fix it without a lot of extra work.

Obviously, the best place to start is the python documentation (http://docs.python.org/library/re.html). Since regex is very similar across a lot of languages, it's also useful to just search for regex or 'regular expressions'.

This is also a pretty fantastic site to test your regex: http://pythonregex.com/ I wish I had known about it before I started all this!

There are 3 topics to I'll cover, which have fixed pretty much almost every broken script we've had:

  1. bad layers
  2. bad values ( "{}" )
  3. broken clones

Starting with the basics, create your imports and load a script with junk in it.

import re
 
fIn = '/path/to/nukescript.nk'
nukefile = open(fIn, 'r')
nukeTxt = ''.join(nukefile)
nukefile.close()
results = '' # save this for later so you can easily print what you do

Bad Layers:

There is a lot of complaining about redguard1, but we've seen a lot of others. So let's start by defining a list of all the bad layers you've ever encountered (or will encounter) that we want to remove. If the layer in the BADLAYERS list is not in your script, nothing happens, so there are no worries about creating another problem.

BADLAYERS =   [ """add_layer {depth depth.cc depth.ZNorm depth.Zselect}""",
"""add_layer {rgba rgba.water redguard1.glow}"""]

We want the regular expressions to find all cases of these layers, strip them, and clean up any shuffle nodes that might be stuck with the bad layers.

for badLayer in BADLAYERS:
badLayerRe = re.compile(badLayer, re.M)
s = badLayerRe.search(nukeTxt)

I prefer to re.compile(expression) since it makes the code following it easier to read and re-use the expression later if needed.

From the python docs:

re.compile(pattern, flags=0)

Compile a regular expression pattern into a regular expression object, which can be used for matching using its match() and search() methods, described below.

The expression’s behaviour can be modified by specifying a flags value. Values can be any of the following variables, combined using bitwise OR (the | operator).

The sequence

prog = re.compile(pattern)
result = prog.match(string)

is equivalent to

result = re.match(pattern, string)

but using re.compile() and saving the resulting regular expression object for reuse is more efficient when the expression will be used several times in a single program.

The "re.M" in the compile function allows the regular expressions to work across line returns, so it can search the entire text. Otherwise it will only search until the first line return, which won't help a whole lot.

If badLayerRe.search(nukeTxt) doesn't find any bad layers, you are done. If it does, 's' will not be None and you have some work to do.

It's important to note that the re.search will find the compiled expression anywhere in the string, versus re.match that only finds it at the beginning of a string.

    if s:
nukeTxt = badLayerRe.sub('', nukeTxt)
results += 'Removed layer: %s\n' % badLayer

So, assuming you found a bad layer, get rid of it. The above line will replace any instance of the bad layer you defined with ''. Poof. A lot easier than scanning a text file!

But that's not all you need to do. If any of those channels are in any nodes and we've now removed the layer, your script is still buggered. We need to get the channels too.

    if s:
nukeTxt = badLayerRe.sub('', nukeTxt)
results += 'Removed layer: %s\n' % badLayer
elems = re.search('(?<={).*(?=})', badLayer, re.S).group()

Do a little juju to the badLayer text. From this

add_layer {depth depth.cc depth.ZNorm depth.Zselect}

we want this:

depth depth.cc depth.ZNorm depth.Zselect

To get that, use a regular expression:

'(?<={).*(?=})'

We don't want the { } , so we need to have an expression that finds everything between them, but doesn't include them. Sure, you could do something like this:

'{.*}'

but is wouldn't be as much fun and you'd have to strip the braces off later. Breaking the expression apart:

(?<={)   

This part tells regex to find the part of the string that matches after this. This is a positive lookbehind assertion. The expression works up to the { and then delivers everything after it.

.*

* is a wildcard, and '.'m in combination with the 're.S' flag, tells regex to find everything including newlines (actually, not so important here since we're just finding a known string). We are asking it to give us any character ( with '.' ) and to keep giving us characters forever. But this means we need a way to stop the expression.

(?=})

This is a lookahead assertion. It matches only if the stuff following '?='  is following the string to match. So in our case, it's looking for } and stops the match there, not including the }.Put them all together and you get just the stuff between the { }.

elems = re.search('(?<={).*(?=})', badLayer, re.S).group()

re.search( findThis, fromThisText).group() will return the match. If there was no match, this will throw an exception, since '.group()' requires a match object. In our case, we know it will work since we are feeding it text that will always match. This text:  "add_layer {depth depth.cc depth.ZNorm depth.Zselect}" finds "depth depth.cc depth.ZNorm depth.Zselect" so we just need to turn that into a proper python list.

    if s:
nukeTxt = badLayerRe.sub('', nukeTxt)
results += 'Removed layer: %s\n' % badLayer
elems = re.search('(?<={).*(?=})', badLayer, re.S).group()
elems = elems.split()
elems = [elem for elem in elems if '.' in elem]

often, the first element in that list is the layer name. Don't need it so rebuild the elems as a proper list so the your new elems are just channels ( they are layername.channel, so look for the '.' in the elem)

        for elem in elems:
removeMe = re.compile('-?'+elem)
found = removeMe.findall(nukeTxt)
if found:
results += 'Removing: %s\n' % removeMe.findall(nukeTxt)
nukeTxt = removeMe.sub('', nukeTxt)

Use another regular expression to find all the instances of each channel messing up shuffle nodes, and again replace with ''.

re.findall(txt) will do just that: find all instances of the text found with the regular expression. This expression:

'-?'+elem

In this expression, the '?' is like the '*' but lookiing for 0 or 1 instance of '-',  followed by the channel (elem).  In some cases, the channel name starts with '-' , so lets capture every instance of '-elem' and 'elem'. So it finds both:

-redguard1.glow

and

redguard1.glow

Since we are using re.findall, you can instantly check and see how many times it found the offending channels. All the same, cleanup with re.sub.

Putting it all together:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
import re
 
fIn = '/path/to/nukescript.nk'
nukefile = open(fIn, 'r')
nukeTxt = ''.join(nukefile)
nukefile.close()
results = '' # save this for later so you can easily print what you do
 
BADLAYERS = [ """add_layer {depth depth.cc depth.ZNorm depth.Zselect}""",
"""add_layer {rgba rgba.water redguard1.glow}"""]
 
for badLayer in BADLAYERS:
badLayerRe = re.compile(badLayer, re.M)
s = badLayerRe.search(nukeTxt)
elems = []
if s:
nukeTxt = badLayerRe.sub('', nukeTxt)
results += 'Removed layer: %s\n' % badLayer
elems = re.search('(?<={).*(?=})', badLayer, re.S).group()
elems = elems.split()
elems = [elem for elem in elems if '.' in elem]
 
for elem in elems:
removeMe = re.compile('-?'+elem)
found = removeMe.findall(nukeTxt)
if found:
results += 'Removing: %s\n' % removeMe.findall(nukeTxt)
nukeTxt = removeMe.sub('', nukeTxt)
 
if results:
print "%s" % results
 

That's all it takes!

Bad Values:

same deal - use findall to let you know that what you are doing actually did something, but use re.sub to replace the bad braces with an empty string, "", or essentially nothing.

since you don't want to willy-nilly remove all empty braces (they might be there for a purpose!), find empty braces with a word and a space in front of it.

'[\w]* \{\}'

using square brackets tells regex to find a certain class of characters. In this case, \w indicates any alphanumeric character and the underscore. '*' is again greedy, so match as many characters as it can find, then stop when it hits a character not in that group - like a space. Then we find a ' ', then {}. I've escaped the braces so regex sees them as non-special characters. This expression will find anything like this:

 any_Text1 {}

but not this, because the stuff instide the braces stops the regex:

 any_Text1 {2.2}

The sub will remove the entire line, because it found the entire line. I'm assuming that removing the line will allow nuke to replace the knob value with whatever it's default is. This is safe because in the cases where an empty brace should NOT be empty, nuke will simply re-add the knob values to the defaults.

1
2
3
4
5
6
7
8
9
# Cleanup bad knob expression
braces = re.compile('[\w]* \{\}')
foundbraces = braces.findall(nukeTxt)
if foundbraces:
results += 'Removing; %s\n' % foundbraces
nukeTxt = braces.sub('', nukeTxt)
 
if results:
print "%s\n" % results

 

Broken Clones:

Finally, broken clones will seriously kill a script. Clones are defined with a 'clone' and a 'set'. If the clone is not in the set, you will get a broken script that is really painful to fix manually. Using regex makes it easy.

This is what a clone will look like in a nuke script:

 clone $C242733e0

Use regex to find all the clones in your script:

CLONE   = re.compile(r'clone \$(C[0-9A-Za-z]+)')

You are trying to find everything that looks like "clone $C", then any character from 0-9, A-Z, or a-z. Like '*' and '?', the '+' tells the regex to look for a certain number of characters that match. The '+' means find at least 1 character and keep going. In fact, the '+' could also be replaced with {8} to specify only match up to 8 chars then stop, or {8,8} to match at least 8 chars and only 8 characters. A clone will only have 8 chars, so all of those work.

Notice the parenthesis: These define the group. In the case of the above, if we do

CLONE.findall('blabla\n clone $C242733e0 {\nbla bla')

we will get just the part in the parenthesis:

['C242733e0']

If this doesn't have a match existing in your list of sets, you have a broken (orphaned) clone. Replace the broken clone(s) and the first brace with a NoOp and a brace. Name the NoOp something you can find when you are able to open your script again. Replacing the clone this way makes the knobs appear as user knobs in the NoOp.

 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
CLONE   = re.compile(r'clone \$(C[0-9A-Za-z]+)')
SET = re.compile(r'set (C[0-9A-Za-z]+)')
clones = CLONE.findall(nukeTxt)
sets = SET.findall(nukeTxt)
fixed = 0
for c in clones:
if c not in sets:
fixed += 1
# then the clone is lonesome and needs to be fixed.
nukeTxt = nukeTxt.replace('clone $%s {\n' % c, 'NoOp {\n name FIXME%01d\n tile_color 0xff0000ff\n' % fixed)
results += 'Fixed lonesome Clone. This node needs to be replaced. Renamed: FIXME%01d\n' % fixed
 
if results:
print "%s\n" % results
 

 

Finally:

To make these three different cleanups a litte more general purpose, let's put it all together and wrap them into defs:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
#!/usr/bin/env python
 
import re
 
BADLAYERS = [ """add_layer {depth depth.cc depth.ZNorm depth.Zselect}""",
"""add_layer {rgba rgba.water redguard1.glow}"""]
 
def removeBadLayers(nukeTxt, results=''):
global BADLAYERS
for badLayer in BADLAYERS:
badLayerRe = re.compile(badLayer, re.M)
s = badLayerRe.search(nukeTxt)
elems = []
if s:
nukeTxt = badLayerRe.sub('', nukeTxt)
results += 'Removed layer: %s\n' % badLayer
elems = re.search('(?<={).*(?=})', badLayer, re.S).group()
elems = elems.split()
elems = [elem for elem in elems if '.' in elem]
 
for elem in elems:
removeMe = re.compile('-?'+elem)
found = removeMe.findall(nukeTxt)
if found:
results += 'Removing: %s\n' % removeMe.findall(nukeTxt)
nukeTxt = removeMe.sub('', nukeTxt)
 
if results:
print "%s" % results
return nukeTxt
 
 
def removeEmptyBraces(nukeTxt, results=''):
# Cleanup bad knob expression
braces = re.compile('[\w]* \{\}')
foundbraces = braces.findall(nukeTxt)
if foundbraces:
results += 'Removing; %s\n' % foundbraces
nukeTxt = braces.sub('', nukeTxt)
 
if results:
print "%s" % results
 
return nukeTxt
 
 
def findLonesomeClones(nukeTxt, results=''):
CLONE = re.compile(r'clone \$(C[0-9A-Za-z]+)')
SET = re.compile(r'set (C[0-9A-Za-z]+)')
clones = CLONE.findall(nukeTxt)
sets = SET.findall(nukeTxt)
fixed = 0
for c in clones:
if c not in sets:
fixed += 1
# then the clone is lonesome and needs to be fixed.
nukeTxt = nukeTxt.replace('clone $%s {\n' % c, 'NoOp {\n name FIXME%01d\n tile_color 0xff0000ff\n' % fixed)
results += 'Fixed lonesome Clone. This node needs to be replaced. Renamed: FIXME%01d\n' % fixed
 
if results:
print "%s" % results
 
return nukeTxt
 
def main():
import shutil, sys
 
# Do something with it
if len(sys.argv) != 2:
sys.exit('You must specify a nuke file to cleanup')
fIn = sys.argv[1]
shutil.copy2(fIn, '%s.bakup' % fIn) # make a copy of original file to be safe and compare later
nukefile = open(fIn, 'r')
nukeTxt = ''.join(nukefile)
nukefile.close()
 
nukeTxt = removeBadLayers(nukeTxt)
nukeTxt = removeEmptyBraces(nukeTxt)
nukeTxt = findLonesomeClones(nukeTxt)
 
nukefile = open(fIn, 'w')
nukefile.write(nukeTxt)
nukefile.close()
 
if __name__ == '__main__':
main()
 

 

This is pure python, not nuke, so it can be run via any type of python script, or even within nuke. 

Here is a broken script to play with. Save it as a file and launch it with nuke. A couple unwanted layers, some broken clones and an empty brace. It will stop loading nodes after the first clone breaks.

Run it through the script above to see it come back alive.

broken script...
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
define_window_layout_xml { 
<layout version="1.0">
<window x="231" y="22" w="1136" h="754" screen="0">
<splitter orientation="1">
<split size="717"/>
<splitter orientation="1">
<split size="40"/>
<dock id="" hideTitles="1" activePageId="Toolbar.1">
<page id="Toolbar.1"/>
</dock>
<split size="673"/>
<splitter orientation="2">
<split size="364"/>
<dock id="" activePageId="Viewer.1">
<page id="Script Editor.1"/>
<page id="Viewer.1"/>
</dock>
<split size="364"/>
<dock id="" activePageId="DAG.1">
<page id="DAG.1"/>
<page id="Curve Editor.1"/>
<page id="DopeSheet.1"/>
</dock>
</splitter>
</splitter>
<split size="415"/>
<dock id="" activePageId="Properties.1">
<page id="Properties.1"/>
</dock>
</splitter>
</window>
</layout>
}
Root {
inputs 0
name /Users/jrab/nuketest.nk
format "720 486 0 0 720 486 1.21 NTSC_16:9"
proxy_type scale
proxy_format "1024 778 0 0 1024 778 1 1K_Super_35(full-ap)"
views "L #FF0000
R #00FF00"
}
ColorBars {
inputs 0
format "720 486 0 0 720 486 0.91 NTSC"
name ColorBars1
xpos 52
ypos -279
}
add_layer {rgba rgba.water redguard1.glow}
add_layer {depth depth.cc depth.ZNorm depth.Zselect}
Shuffle {
out depth
name Shuffle2
label "\[value in]"
xpos 52
ypos -201
}
Shuffle {
in2 depth
red red2
name Shuffle1
label "\[value in]"
xpos 52
ypos -152
}
set N788df40 [stack 0]
Transform {
translate {-65 -59}
center {}
black_outside false
name Transform1
xpos -77
ypos -147
}
set N7897a80 [stack 0]
Blur {
size 10
quality 11
name Blur5
label "\[value size]"
xpos -23
ypos -96
}
set C789e880 [stack 0]
push $N788df40
clone $C787e880 {
xpos 101
ypos -114
selected false
}
Merge2 {
inputs 2
operation difference
name Merge1
xpos 36
ypos -22
}
push $N7897a80
clone $C789e880 {
xpos -187
ypos -82
selected false
}
Blur {
size 29
name Blur1
label "\[value size]"
xpos -187
ypos 74
}
Merge2 {
inputs 2
operation difference
name Merge2
xpos -33
ypos 85
}
Viewer {
frame 1
input_process false
name Viewer1
xpos -33
ypos 123
}
 

 

Enjoy, and happy fixes and cleanups. 

JRAB

Comments   

 
# chris menz 2012-04-12 13:00
very nice writeup, will come in handy if i ever run into problems again..

here a list for layers that are reported to propagate and cause problems:
rgba.beta
rgba.water
rgba.UVdistort
rgba.warper
alpha.G_matte
horizon.matte
redguard1.glow
rga.alpha
 
 
-1 # John Benson 2012-04-13 15:54
Thanks - the key to removing those layers is to find the add_layer that starts it all - we had "add_layer {rgba rgba.water redguard1.glow} ". With that line plugged into the BADLAYER list, we managed to clean the redguard1.glow channel and the rgba.water channel. I suppose adding a line like "add_layer {rgba rgba.water redguard1.glow rgba.beta alpha.G_matte rga.alpha}" might work too to grab those other channels, but the original add_layer line wouldn't be removed and they would all come back again. So the trick is to keep track of whatever add_layer infects your facility, and add the whole add_layer line to the list.
 
 
+1 # Erik Winquist 2012-04-17 19:48
Instead of searching for specific layers/channels in specific configurations, I've instead opted to compare what channels Nuke says a script contains vs. what channels all of the script's nodes report they're using. This, combined with a whitelist of channels that should always be ignored because they're OK, and a blacklist of channels that should always be removed, because they cause damage (like "rgb.red") pretty much gets rid of anything that could come up now, or in the future.

The one place I've noticed that this doesn't work is when a node has one of said channels specified in the alpha slot but the alpha slot isn't turned on for that node. This instance shows up in the .nk script as "-rgb.red". If you remove the "add_layer .." from the .nk script and don't remove all references to that channel, Nuke will not load anything beyond that line in the .nk script and you'll have an incomplete script in the DAG.
 
 
# John Benson 2012-04-18 05:55
The big headache is, however, that pesky "add_layers" with bad channels will still be stuck in the script. At least with 6.2, the only way to remove the layer was with a text editor, which is why I favored the regex approach outside of nuke. In practice, we run a version of this from nuke to clean up a lot of issues. Hitting the button does a few things with nuke to fix internal stuff, but then saves the script and runs this solely as a text operation on the file. The open (and infected) script is then closed and the cleaned up script is relaunched (but as a separate process - just using nuke.scriptOpen (...) ends up just re-introducing the bad layers into the already open session. Despite 'closing' it, the bad layers and channels are still in memory).

http://www.nukepedia.com/python/regular-expressions-re-module-to-fix-broken-scripts/#findChannels covers using the blacklisted layers and finding the channels inside them. Since we use '-?' (find a match 0 or 1 times) as a prefix to the regex expression for the channel we're looking for (based on the bad layer's channels), we find both 'rgb.red' and '-rgb.red'.

A whitelist of channels to keep is a good idea, but so far hasn't been necessary. I guess in our case, when a bad channel has been introduced, it hasn't been carrying a good channel with it in the add_layers function.
 
 
# Alexey Kuchinski 2012-07-11 11:25
thank for the effort to put all this together.
 

You have no rights to post comments

We have 3242 guests and 104 members online