Machine Translation Explanations
This notebook demonstrates model explanations for a text to text scenario using a pretrained transformer model for machine translation. In this demo, we showcase explanations on two different models: English to Spanish (https://huggingface.co/Helsinki-NLP/opus-mt-en-es), and English to French (https://huggingface.co/Helsinki-NLP/opus-mt-en-fr).
[1]:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
import shap
English to Spanish model
[2]:
# load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-es")
model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-en-es").cuda()
# define the input sentences we want to translate
data = [
"Transformers have rapidly become the model of choice for NLP problems, replacing older recurrent neural network models"
]
Explain the model’s predictions
[3]:
# we build an explainer by passing the model we want to explain and
# the tokenizer we want to use to break up the input strings
explainer = shap.Explainer(model, tokenizer)
# explainers are callable, just like models
shap_values = explainer(data, fixed_context=1)
floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at /pytorch/aten/src/ATen/native/BinaryOps.cpp:467.)
Visualize shap explanations
[4]:
shap.plots.text(shap_values)
[0]
outputs
Los
transformador
es
se
han
convertido
rápidamente
en
el
modelo
de
elección
para
problemas
N
LP
,
reemplaza
ndo
modelos
de
red
neuro
nal
recurrente
s
más
antiguos
inputs
1.965
▁Transform
5.114
ers
1.903
▁have
-0.505
▁rapidly
0.186
▁become
0.101
▁the
-0.225
▁model
0.325
▁of
-0.114
▁choice
0.081
▁for
-0.096
▁N
0.021
LP
-0.247
▁problems
-0.417
,
0.053
▁replacing
0.025
▁older
0.05
▁recurrent
0.172
▁neural
0.105
▁network
-0.114
▁models
-0.1
inputs
1.965
▁Transform
5.114
ers
1.903
▁have
-0.505
▁rapidly
0.186
▁become
0.101
▁the
-0.225
▁model
0.325
▁of
-0.114
▁choice
0.081
▁for
-0.096
▁N
0.021
LP
-0.247
▁problems
-0.417
,
0.053
▁replacing
0.025
▁older
0.05
▁recurrent
0.172
▁neural
0.105
▁network
-0.114
▁models
-0.1
inputs
7.261
▁Transform
4.398
ers
-0.073
▁have
0.104
▁rapidly
-0.194
▁become
0.024
▁the
0.131
▁model
0.117
▁of
0.001
▁choice
0.242
▁for
0.092
▁N
-0.103
LP
-0.173
▁problems
0.14
,
0.16
▁replacing
0.203
▁older
0.247
▁recurrent
0.094
▁neural
0.261
▁network
0.309
▁models
0.267
inputs
7.261
▁Transform
4.398
ers
-0.073
▁have
0.104
▁rapidly
-0.194
▁become
0.024
▁the
0.131
▁model
0.117
▁of
0.001
▁choice
0.242
▁for
0.092
▁N
-0.103
LP
-0.173
▁problems
0.14
,
0.16
▁replacing
0.203
▁older
0.247
▁recurrent
0.094
▁neural
0.261
▁network
0.309
▁models
0.267
inputs
-0.165
▁Transform
-0.11
ers
-0.009
▁have
-0.035
▁rapidly
0.017
▁become
0.002
▁the
-0.008
▁model
-0.011
▁of
-0.015
▁choice
-0.014
▁for
-0.006
▁N
-0.007
LP
0.004
▁problems
0.012
,
-0.01
▁replacing
-0.002
▁older
-0.006
▁recurrent
-0.009
▁neural
-0.003
▁network
-0.003
▁models
0.007
inputs
-0.165
▁Transform
-0.11
ers
-0.009
▁have
-0.035
▁rapidly
0.017
▁become
0.002
▁the
-0.008
▁model
-0.011
▁of
-0.015
▁choice
-0.014
▁for
-0.006
▁N
-0.007
LP
0.004
▁problems
0.012
,
-0.01
▁replacing
-0.002
▁older
-0.006
▁recurrent
-0.009
▁neural
-0.003
▁network
-0.003
▁models
0.007
inputs
-0.101
▁Transform
1.591
ers
0.787
▁have
-0.91
▁rapidly
5.289
▁become
-0.661
▁the
-0.702
▁model
-0.672
▁of
-0.026
▁choice
0.042
▁for
-0.014
▁N
0.026
LP
-0.086
▁problems
-0.046
,
-0.0
▁replacing
-0.036
▁older
-0.014
▁recurrent
0.042
▁neural
0.016
▁network
0.021
▁models
-0.052
inputs
-0.101
▁Transform
1.591
ers
0.787
▁have
-0.91
▁rapidly
5.289
▁become
-0.661
▁the
-0.702
▁model
-0.672
▁of
-0.026
▁choice
0.042
▁for
-0.014
▁N
0.026
LP
-0.086
▁problems
-0.046
,
-0.0
▁replacing
-0.036
▁older
-0.014
▁recurrent
0.042
▁neural
0.016
▁network
0.021
▁models
-0.052
inputs
-0.385
▁Transform
-0.282
ers
6.018
▁have
-1.286
▁rapidly
1.998
▁become
-0.009
▁the
-0.315
▁model
-0.146
▁of
-0.0
▁choice
0.016
▁for
-0.003
▁N
0.005
LP
-0.014
▁problems
-0.107
,
-0.039
▁replacing
-0.019
▁older
-0.03
▁recurrent
-0.028
▁neural
-0.126
▁network
-0.054
▁models
-0.064
inputs
-0.385
▁Transform
-0.282
ers
6.018
▁have
-1.286
▁rapidly
1.998
▁become
-0.009
▁the
-0.315
▁model
-0.146
▁of
-0.0
▁choice
0.016
▁for
-0.003
▁N
0.005
LP
-0.014
▁problems
-0.107
,
-0.039
▁replacing
-0.019
▁older
-0.03
▁recurrent
-0.028
▁neural
-0.126
▁network
-0.054
▁models
-0.064
inputs
0.139
▁Transform
-0.046
ers
-1.362
▁have
0.861
▁rapidly
4.329
▁become
0.817
▁the
0.113
▁model
-0.155
▁of
0.001
▁choice
0.01
▁for
0.054
▁N
0.001
LP
-0.032
▁problems
0.079
,
0.038
▁replacing
0.054
▁older
0.05
▁recurrent
-0.012
▁neural
-0.065
▁network
-0.009
▁models
0.205
inputs
0.139
▁Transform
-0.046
ers
-1.362
▁have
0.861
▁rapidly
4.329
▁become
0.817
▁the
0.113
▁model
-0.155
▁of
0.001
▁choice
0.01
▁for
0.054
▁N
0.001
LP
-0.032
▁problems
0.079
,
0.038
▁replacing
0.054
▁older
0.05
▁recurrent
-0.012
▁neural
-0.065
▁network
-0.009
▁models
0.205
inputs
-0.418
▁Transform
-0.502
ers
-1.065
▁have
12.239
▁rapidly
-0.061
▁become
-0.198
▁the
-0.227
▁model
-0.344
▁of
0.125
▁choice
0.06
▁for
0.053
▁N
-0.061
LP
-0.031
▁problems
0.16
,
0.117
▁replacing
0.119
▁older
0.132
▁recurrent
0.076
▁neural
-0.017
▁network
0.112
▁models
0.252
inputs
-0.418
▁Transform
-0.502
ers
-1.065
▁have
12.239
▁rapidly
-0.061
▁become
-0.198
▁the
-0.227
▁model
-0.344
▁of
0.125
▁choice
0.06
▁for
0.053
▁N
-0.061
LP
-0.031
▁problems
0.16
,
0.117
▁replacing
0.119
▁older
0.132
▁recurrent
0.076
▁neural
-0.017
▁network
0.112
▁models
0.252
inputs
-0.461
▁Transform
-0.455
ers
-0.602
▁have
-0.274
▁rapidly
2.285
▁become
0.748
▁the
-0.015
▁model
-0.458
▁of
0.328
▁choice
0.121
▁for
-0.099
▁N
0.03
LP
-0.015
▁problems
0.062
,
0.112
▁replacing
0.075
▁older
0.061
▁recurrent
0.08
▁neural
0.224
▁network
0.118
▁models
0.035
inputs
-0.461
▁Transform
-0.455
ers
-0.602
▁have
-0.274
▁rapidly
2.285
▁become
0.748
▁the
-0.015
▁model
-0.458
▁of
0.328
▁choice
0.121
▁for
-0.099
▁N
0.03
LP
-0.015
▁problems
0.062
,
0.112
▁replacing
0.075
▁older
0.061
▁recurrent
0.08
▁neural
0.224
▁network
0.118
▁models
0.035
inputs
-0.025
▁Transform
-0.006
ers
-0.099
▁have
0.298
▁rapidly
-0.899
▁become
1.928
▁the
2.488
▁model
0.826
▁of
0.354
▁choice
0.108
▁for
0.033
▁N
-0.129
LP
-0.222
▁problems
0.132
,
-0.025
▁replacing
0.045
▁older
0.038
▁recurrent
0.001
▁neural
-0.045
▁network
-0.032
▁models
0.05
inputs
-0.025
▁Transform
-0.006
ers
-0.099
▁have
0.298
▁rapidly
-0.899
▁become
1.928
▁the
2.488
▁model
0.826
▁of
0.354
▁choice
0.108
▁for
0.033
▁N
-0.129
LP
-0.222
▁problems
0.132
,
-0.025
▁replacing
0.045
▁older
0.038
▁recurrent
0.001
▁neural
-0.045
▁network
-0.032
▁models
0.05
inputs
0.015
▁Transform
-0.107
ers
-0.047
▁have
0.385
▁rapidly
-1.63
▁become
0.418
▁the
8.907
▁model
-1.483
▁of
0.934
▁choice
0.131
▁for
0.131
▁N
-0.261
LP
-0.187
▁problems
-0.002
,
0.127
▁replacing
0.204
▁older
0.184
▁recurrent
0.186
▁neural
0.179
▁network
0.206
▁models
0.222
inputs
0.015
▁Transform
-0.107
ers
-0.047
▁have
0.385
▁rapidly
-1.63
▁become
0.418
▁the
8.907
▁model
-1.483
▁of
0.934
▁choice
0.131
▁for
0.131
▁N
-0.261
LP
-0.187
▁problems
-0.002
,
0.127
▁replacing
0.204
▁older
0.184
▁recurrent
0.186
▁neural
0.179
▁network
0.206
▁models
0.222
inputs
0.248
▁Transform
0.074
ers
0.239
▁have
0.344
▁rapidly
0.198
▁become
-0.31
▁the
-0.476
▁model
1.036
▁of
-0.421
▁choice
-0.153
▁for
0.208
▁N
-0.169
LP
0.013
▁problems
0.316
,
-0.043
▁replacing
0.023
▁older
0.139
▁recurrent
-0.068
▁neural
-0.406
▁network
-0.323
▁models
0.415
inputs
0.248
▁Transform
0.074
ers
0.239
▁have
0.344
▁rapidly
0.198
▁become
-0.31
▁the
-0.476
▁model
1.036
▁of
-0.421
▁choice
-0.153
▁for
0.208
▁N
-0.169
LP
0.013
▁problems
0.316
,
-0.043
▁replacing
0.023
▁older
0.139
▁recurrent
-0.068
▁neural
-0.406
▁network
-0.323
▁models
0.415
inputs
-0.737
▁Transform
-0.698
ers
-0.744
▁have
-0.49
▁rapidly
-0.08
▁become
-0.075
▁the
3.675
▁model
2.756
▁of
13.188
▁choice
-1.316
▁for
-0.737
▁N
-0.664
LP
-1.415
▁problems
-3.552
,
-0.0
▁replacing
-0.065
▁older
-0.086
▁recurrent
0.133
▁neural
0.131
▁network
-0.006
▁models
-0.36
inputs
-0.737
▁Transform
-0.698
ers
-0.744
▁have
-0.49
▁rapidly
-0.08
▁become
-0.075
▁the
3.675
▁model
2.756
▁of
13.188
▁choice
-1.316
▁for
-0.737
▁N
-0.664
LP
-1.415
▁problems
-3.552
,
-0.0
▁replacing
-0.065
▁older
-0.086
▁recurrent
0.133
▁neural
0.131
▁network
-0.006
▁models
-0.36
inputs
-0.012
▁Transform
0.092
ers
-0.033
▁have
0.202
▁rapidly
-0.036
▁become
-0.013
▁the
0.457
▁model
0.559
▁of
-0.364
▁choice
4.933
▁for
0.016
▁N
-0.196
LP
-0.099
▁problems
-1.145
,
-0.042
▁replacing
0.056
▁older
0.056
▁recurrent
0.077
▁neural
0.062
▁network
0.118
▁models
0.056
inputs
-0.012
▁Transform
0.092
ers
-0.033
▁have
0.202
▁rapidly
-0.036
▁become
-0.013
▁the
0.457
▁model
0.559
▁of
-0.364
▁choice
4.933
▁for
0.016
▁N
-0.196
LP
-0.099
▁problems
-1.145
,
-0.042
▁replacing
0.056
▁older
0.056
▁recurrent
0.077
▁neural
0.062
▁network
0.118
▁models
0.056
inputs
0.079
▁Transform
0.117
ers
0.209
▁have
0.139
▁rapidly
0.018
▁become
-0.287
▁the
0.186
▁model
-0.2
▁of
-1.179
▁choice
1.432
▁for
0.926
▁N
-0.733
LP
9.825
▁problems
-1.025
,
0.129
▁replacing
-0.013
▁older
0.024
▁recurrent
0.146
▁neural
0.125
▁network
0.151
▁models
0.084
inputs
0.079
▁Transform
0.117
ers
0.209
▁have
0.139
▁rapidly
0.018
▁become
-0.287
▁the
0.186
▁model
-0.2
▁of
-1.179
▁choice
1.432
▁for
0.926
▁N
-0.733
LP
9.825
▁problems
-1.025
,
0.129
▁replacing
-0.013
▁older
0.024
▁recurrent
0.146
▁neural
0.125
▁network
0.151
▁models
0.084
inputs
0.214
▁Transform
0.04
ers
0.095
▁have
0.15
▁rapidly
0.009
▁become
-0.093
▁the
0.295
▁model
-0.15
▁of
0.445
▁choice
0.115
▁for
9.504
▁N
-0.924
LP
0.377
▁problems
-0.05
,
0.158
▁replacing
0.053
▁older
0.027
▁recurrent
-0.029
▁neural
0.107
▁network
0.165
▁models
0.375
inputs
0.214
▁Transform
0.04
ers
0.095
▁have
0.15
▁rapidly
0.009
▁become
-0.093
▁the
0.295
▁model
-0.15
▁of
0.445
▁choice
0.115
▁for
9.504
▁N
-0.924
LP
0.377
▁problems
-0.05
,
0.158
▁replacing
0.053
▁older
0.027
▁recurrent
-0.029
▁neural
0.107
▁network
0.165
▁models
0.375
inputs
-0.06
▁Transform
-0.148
ers
-0.117
▁have
-0.103
▁rapidly
-0.15
▁become
-0.125
▁the
-0.096
▁model
-0.101
▁of
-0.067
▁choice
-0.019
▁for
0.226
▁N
11.549
LP
0.346
▁problems
-0.492
,
-0.099
▁replacing
0.01
▁older
0.037
▁recurrent
0.051
▁neural
-0.038
▁network
0.001
▁models
-0.105
inputs
-0.06
▁Transform
-0.148
ers
-0.117
▁have
-0.103
▁rapidly
-0.15
▁become
-0.125
▁the
-0.096
▁model
-0.101
▁of
-0.067
▁choice
-0.019
▁for
0.226
▁N
11.549
LP
0.346
▁problems
-0.492
,
-0.099
▁replacing
0.01
▁older
0.037
▁recurrent
0.051
▁neural
-0.038
▁network
0.001
▁models
-0.105
inputs
0.021
▁Transform
-0.048
ers
0.034
▁have
0.298
▁rapidly
0.151
▁become
-0.083
▁the
0.365
▁model
-0.084
▁of
-0.375
▁choice
-0.361
▁for
-0.949
▁N
-0.021
LP
2.092
▁problems
3.632
,
0.407
▁replacing
-0.024
▁older
-0.051
▁recurrent
-0.085
▁neural
0.054
▁network
0.057
▁models
-0.069
inputs
0.021
▁Transform
-0.048
ers
0.034
▁have
0.298
▁rapidly
0.151
▁become
-0.083
▁the
0.365
▁model
-0.084
▁of
-0.375
▁choice
-0.361
▁for
-0.949
▁N
-0.021
LP
2.092
▁problems
3.632
,
0.407
▁replacing
-0.024
▁older
-0.051
▁recurrent
-0.085
▁neural
0.054
▁network
0.057
▁models
-0.069
inputs
-0.053
▁Transform
0.107
ers
0.073
▁have
0.072
▁rapidly
0.194
▁become
0.006
▁the
-0.023
▁model
0.057
▁of
0.313
▁choice
0.151
▁for
0.032
▁N
0.222
LP
0.435
▁problems
1.236
,
13.012
▁replacing
-0.981
▁older
-0.599
▁recurrent
-0.558
▁neural
-0.465
▁network
-0.598
▁models
-1.671
inputs
-0.053
▁Transform
0.107
ers
0.073
▁have
0.072
▁rapidly
0.194
▁become
0.006
▁the
-0.023
▁model
0.057
▁of
0.313
▁choice
0.151
▁for
0.032
▁N
0.222
LP
0.435
▁problems
1.236
,
13.012
▁replacing
-0.981
▁older
-0.599
▁recurrent
-0.558
▁neural
-0.465
▁network
-0.598
▁models
-1.671
inputs
-0.023
▁Transform
-0.058
ers
-0.042
▁have
-0.055
▁rapidly
-0.083
▁become
-0.055
▁the
0.06
▁model
-0.079
▁of
-0.02
▁choice
-0.087
▁for
-0.011
▁N
0.014
LP
-0.052
▁problems
-0.053
,
1.259
▁replacing
0.08
▁older
0.078
▁recurrent
-0.038
▁neural
0.233
▁network
0.072
▁models
-0.259
inputs
-0.023
▁Transform
-0.058
ers
-0.042
▁have
-0.055
▁rapidly
-0.083
▁become
-0.055
▁the
0.06
▁model
-0.079
▁of
-0.02
▁choice
-0.087
▁for
-0.011
▁N
0.014
LP
-0.052
▁problems
-0.053
,
1.259
▁replacing
0.08
▁older
0.078
▁recurrent
-0.038
▁neural
0.233
▁network
0.072
▁models
-0.259
inputs
-0.129
▁Transform
0.1
ers
-0.07
▁have
-0.092
▁rapidly
0.007
▁become
0.082
▁the
-0.012
▁model
0.1
▁of
-0.049
▁choice
0.112
▁for
0.022
▁N
-0.035
LP
-0.057
▁problems
0.009
,
-0.412
▁replacing
1.349
▁older
0.254
▁recurrent
-0.399
▁neural
-0.955
▁network
10.014
▁models
-0.665
inputs
-0.129
▁Transform
0.1
ers
-0.07
▁have
-0.092
▁rapidly
0.007
▁become
0.082
▁the
-0.012
▁model
0.1
▁of
-0.049
▁choice
0.112
▁for
0.022
▁N
-0.035
LP
-0.057
▁problems
0.009
,
-0.412
▁replacing
1.349
▁older
0.254
▁recurrent
-0.399
▁neural
-0.955
▁network
10.014
▁models
-0.665
inputs
-0.009
▁Transform
0.011
ers
-0.018
▁have
0.026
▁rapidly
-0.029
▁become
-0.061
▁the
-0.044
▁model
-0.044
▁of
-0.007
▁choice
0.005
▁for
-0.05
▁N
-0.027
LP
-0.013
▁problems
-0.072
,
-0.472
▁replacing
0.303
▁older
-0.589
▁recurrent
-0.661
▁neural
2.694
▁network
0.398
▁models
-0.359
inputs
-0.009
▁Transform
0.011
ers
-0.018
▁have
0.026
▁rapidly
-0.029
▁become
-0.061
▁the
-0.044
▁model
-0.044
▁of
-0.007
▁choice
0.005
▁for
-0.05
▁N
-0.027
LP
-0.013
▁problems
-0.072
,
-0.472
▁replacing
0.303
▁older
-0.589
▁recurrent
-0.661
▁neural
2.694
▁network
0.398
▁models
-0.359
inputs
-0.082
▁Transform
-0.095
ers
-0.112
▁have
0.017
▁rapidly
-0.188
▁become
-0.253
▁the
0.208
▁model
-0.286
▁of
0.346
▁choice
0.015
▁for
0.013
▁N
0.007
LP
0.143
▁problems
-0.112
,
0.129
▁replacing
0.054
▁older
-0.099
▁recurrent
-1.02
▁neural
9.753
▁network
0.115
▁models
-0.567
inputs
-0.082
▁Transform
-0.095
ers
-0.112
▁have
0.017
▁rapidly
-0.188
▁become
-0.253
▁the
0.208
▁model
-0.286
▁of
0.346
▁choice
0.015
▁for
0.013
▁N
0.007
LP
0.143
▁problems
-0.112
,
0.129
▁replacing
0.054
▁older
-0.099
▁recurrent
-1.02
▁neural
9.753
▁network
0.115
▁models
-0.567
inputs
0.088
▁Transform
-0.106
ers
-0.063
▁have
-0.083
▁rapidly
-0.025
▁become
-0.015
▁the
-0.077
▁model
-0.007
▁of
-0.052
▁choice
-0.053
▁for
0.252
▁N
0.186
LP
-0.029
▁problems
-0.266
,
-0.986
▁replacing
-0.897
▁older
-1.607
▁recurrent
12.728
▁neural
2.121
▁network
-0.289
▁models
-1.581
inputs
0.088
▁Transform
-0.106
ers
-0.063
▁have
-0.083
▁rapidly
-0.025
▁become
-0.015
▁the
-0.077
▁model
-0.007
▁of
-0.052
▁choice
-0.053
▁for
0.252
▁N
0.186
LP
-0.029
▁problems
-0.266
,
-0.986
▁replacing
-0.897
▁older
-1.607
▁recurrent
12.728
▁neural
2.121
▁network
-0.289
▁models
-1.581
inputs
-0.096
▁Transform
-0.006
ers
-0.059
▁have
-0.137
▁rapidly
-0.084
▁become
-0.003
▁the
-0.044
▁model
0.021
▁of
0.022
▁choice
0.02
▁for
0.229
▁N
-0.034
LP
-0.128
▁problems
-0.244
,
-0.783
▁replacing
-0.757
▁older
-1.16
▁recurrent
5.378
▁neural
1.653
▁network
0.323
▁models
-0.681
inputs
-0.096
▁Transform
-0.006
ers
-0.059
▁have
-0.137
▁rapidly
-0.084
▁become
-0.003
▁the
-0.044
▁model
0.021
▁of
0.022
▁choice
0.02
▁for
0.229
▁N
-0.034
LP
-0.128
▁problems
-0.244
,
-0.783
▁replacing
-0.757
▁older
-1.16
▁recurrent
5.378
▁neural
1.653
▁network
0.323
▁models
-0.681
inputs
0.158
▁Transform
-0.05
ers
0.002
▁have
0.019
▁rapidly
0.028
▁become
0.011
▁the
0.071
▁model
-0.067
▁of
0.022
▁choice
0.035
▁for
0.001
▁N
0.065
LP
-0.059
▁problems
-0.272
,
-0.002
▁replacing
-1.659
▁older
11.424
▁recurrent
1.279
▁neural
1.16
▁network
1.056
▁models
-1.005
inputs
0.158
▁Transform
-0.05
ers
0.002
▁have
0.019
▁rapidly
0.028
▁become
0.011
▁the
0.071
▁model
-0.067
▁of
0.022
▁choice
0.035
▁for
0.001
▁N
0.065
LP
-0.059
▁problems
-0.272
,
-0.002
▁replacing
-1.659
▁older
11.424
▁recurrent
1.279
▁neural
1.16
▁network
1.056
▁models
-1.005
inputs
0.01
▁Transform
0.049
ers
-0.033
▁have
-0.045
▁rapidly
0.041
▁become
0.087
▁the
-0.127
▁model
0.028
▁of
-0.074
▁choice
-0.009
▁for
-0.062
▁N
-0.005
LP
0.025
▁problems
-0.048
,
-0.06
▁replacing
0.381
▁older
0.279
▁recurrent
-0.037
▁neural
0.034
▁network
1.056
▁models
-0.139
inputs
0.01
▁Transform
0.049
ers
-0.033
▁have
-0.045
▁rapidly
0.041
▁become
0.087
▁the
-0.127
▁model
0.028
▁of
-0.074
▁choice
-0.009
▁for
-0.062
▁N
-0.005
LP
0.025
▁problems
-0.048
,
-0.06
▁replacing
0.381
▁older
0.279
▁recurrent
-0.037
▁neural
0.034
▁network
1.056
▁models
-0.139
inputs
-0.009
▁Transform
-0.103
ers
-0.078
▁have
0.185
▁rapidly
-0.083
▁become
-0.113
▁the
0.072
▁model
-0.072
▁of
0.067
▁choice
0.196
▁for
0.007
▁N
0.071
LP
-0.02
▁problems
-0.163
,
-1.823
▁replacing
7.364
▁older
1.044
▁recurrent
-0.746
▁neural
0.438
▁network
0.369
▁models
-0.486
inputs
-0.009
▁Transform
-0.103
ers
-0.078
▁have
0.185
▁rapidly
-0.083
▁become
-0.113
▁the
0.072
▁model
-0.072
▁of
0.067
▁choice
0.196
▁for
0.007
▁N
0.071
LP
-0.02
▁problems
-0.163
,
-1.823
▁replacing
7.364
▁older
1.044
▁recurrent
-0.746
▁neural
0.438
▁network
0.369
▁models
-0.486
inputs
-0.138
▁Transform
-0.027
ers
-0.066
▁have
-0.117
▁rapidly
-0.16
▁become
0.105
▁the
-0.201
▁model
0.046
▁of
-0.147
▁choice
-0.016
▁for
0.056
▁N
0.017
LP
-0.152
▁problems
-0.216
,
-0.623
▁replacing
6.3
▁older
-0.004
▁recurrent
-0.432
▁neural
0.464
▁network
0.788
▁models
-0.331
inputs
-0.138
▁Transform
-0.027
ers
-0.066
▁have
-0.117
▁rapidly
-0.16
▁become
0.105
▁the
-0.201
▁model
0.046
▁of
-0.147
▁choice
-0.016
▁for
0.056
▁N
0.017
LP
-0.152
▁problems
-0.216
,
-0.623
▁replacing
6.3
▁older
-0.004
▁recurrent
-0.432
▁neural
0.464
▁network
0.788
▁models
-0.331
English to French
[5]:
tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-fr")
model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-en-fr").cuda()
[6]:
explainer = shap.Explainer(model, tokenizer)
shap_values = explainer(data)
Partition explainer: 2it [00:12, 6.35s/it]
[7]:
shap.plots.text(shap_values)
[0]
outputs
Les
transformateurs
sont
rapidement
devenus
le
modèle
de
choix
pour
les
problèmes
de
N
LP
,
remplaçant
les
anciens
modèles
de
réseaux
neuro
naux
récurrent
s
inputs
1.472
▁Trans
-0.258
former
2.359
s
1.248
▁have
0.828
▁rapidly
0.9 / 2
▁become▁the
-0.292
▁model
-0.25
▁of
-0.246
▁choice
-0.226
▁for
0.205 / 2
▁NLP
0.174
▁problems
-0.067
,
-0.393
▁replacing
-0.353
▁older
-0.165
▁recurrent
-0.017
▁ne
0.041
ural
-0.213
▁network
-0.005
▁models
0.0
inputs
1.472
▁Trans
-0.258
former
2.359
s
1.248
▁have
0.828
▁rapidly
0.9 / 2
▁become▁the
-0.292
▁model
-0.25
▁of
-0.246
▁choice
-0.226
▁for
0.205 / 2
▁NLP
0.174
▁problems
-0.067
,
-0.393
▁replacing
-0.353
▁older
-0.165
▁recurrent
-0.017
▁ne
0.041
ural
-0.213
▁network
-0.005
▁models
0.0
inputs
5.633
▁Trans
5.908
former
0.405
s
0.187
▁have
0.166
▁rapidly
-0.203 / 2
▁become▁the
0.218
▁model
0.186
▁of
0.223
▁choice
0.201
▁for
0.082 / 2
▁NLP
-0.597
▁problems
-0.101
,
0.444
▁replacing
0.366
▁older
0.081
▁recurrent
0.06
▁ne
-0.134
ural
0.171
▁network
0.017
▁models
0.0
inputs
5.633
▁Trans
5.908
former
0.405
s
0.187
▁have
0.166
▁rapidly
-0.203 / 2
▁become▁the
0.218
▁model
0.186
▁of
0.223
▁choice
0.201
▁for
0.082 / 2
▁NLP
-0.597
▁problems
-0.101
,
0.444
▁replacing
0.366
▁older
0.081
▁recurrent
0.06
▁ne
-0.134
ural
0.171
▁network
0.017
▁models
0.0
inputs
-0.37
▁Trans
-0.08
former
0.541
s
1.041
▁have
0.749
▁rapidly
1.98 / 2
▁become▁the
0.046
▁model
0.0
▁of
-0.014
▁choice
0.011
▁for
-0.026 / 2
▁NLP
-0.066
▁problems
0.02
,
-0.034
▁replacing
-0.009
▁older
-0.035
▁recurrent
-0.062
▁ne
-0.09
ural
-0.068
▁network
-0.162
▁models
-0.0
inputs
-0.37
▁Trans
-0.08
former
0.541
s
1.041
▁have
0.749
▁rapidly
1.98 / 2
▁become▁the
0.046
▁model
0.0
▁of
-0.014
▁choice
0.011
▁for
-0.026 / 2
▁NLP
-0.066
▁problems
0.02
,
-0.034
▁replacing
-0.009
▁older
-0.035
▁recurrent
-0.062
▁ne
-0.09
ural
-0.068
▁network
-0.162
▁models
-0.0
inputs
0.213
▁Trans
0.196
former
0.608
s
2.817
▁have
4.054
▁rapidly
1.703 / 2
▁become▁the
0.158
▁model
0.056
▁of
0.053
▁choice
0.091
▁for
0.118 / 2
▁NLP
0.328
▁problems
-0.01
,
0.199
▁replacing
0.196
▁older
0.182
▁recurrent
0.054
▁ne
0.059
ural
0.157
▁network
-0.03
▁models
0.0
inputs
0.213
▁Trans
0.196
former
0.608
s
2.817
▁have
4.054
▁rapidly
1.703 / 2
▁become▁the
0.158
▁model
0.056
▁of
0.053
▁choice
0.091
▁for
0.118 / 2
▁NLP
0.328
▁problems
-0.01
,
0.199
▁replacing
0.196
▁older
0.182
▁recurrent
0.054
▁ne
0.059
ural
0.157
▁network
-0.03
▁models
0.0
inputs
0.09
▁Trans
0.236
former
0.008
s
1.237
▁have
1.239
▁rapidly
4.845 / 2
▁become▁the
0.042
▁model
0.045
▁of
-0.098
▁choice
-0.141
▁for
-0.087 / 2
▁NLP
0.188
▁problems
0.044
,
0.184
▁replacing
0.17
▁older
-0.057
▁recurrent
-0.06
▁ne
0.041
ural
-0.097
▁network
-0.03
▁models
-0.0
inputs
0.09
▁Trans
0.236
former
0.008
s
1.237
▁have
1.239
▁rapidly
4.845 / 2
▁become▁the
0.042
▁model
0.045
▁of
-0.098
▁choice
-0.141
▁for
-0.087 / 2
▁NLP
0.188
▁problems
0.044
,
0.184
▁replacing
0.17
▁older
-0.057
▁recurrent
-0.06
▁ne
0.041
ural
-0.097
▁network
-0.03
▁models
-0.0
inputs
0.36
▁Trans
0.286
former
0.299
s
0.044
▁have
0.169
▁rapidly
3.663 / 2
▁become▁the
1.22
▁model
0.159
▁of
0.301
▁choice
0.327
▁for
0.116 / 2
▁NLP
-0.69
▁problems
-0.003
,
0.0
▁replacing
0.045
▁older
-0.038
▁recurrent
-0.022
▁ne
-0.048
ural
0.168
▁network
-0.091
▁models
-0.0
inputs
0.36
▁Trans
0.286
former
0.299
s
0.044
▁have
0.169
▁rapidly
3.663 / 2
▁become▁the
1.22
▁model
0.159
▁of
0.301
▁choice
0.327
▁for
0.116 / 2
▁NLP
-0.69
▁problems
-0.003
,
0.0
▁replacing
0.045
▁older
-0.038
▁recurrent
-0.022
▁ne
-0.048
ural
0.168
▁network
-0.091
▁models
-0.0
inputs
0.399
▁Trans
0.529
former
-0.203
s
0.05
▁have
0.128
▁rapidly
0.104 / 2
▁become▁the
5.148
▁model
1.552
▁of
0.23
▁choice
0.24
▁for
-0.019 / 2
▁NLP
-0.76
▁problems
0.027
,
-0.015
▁replacing
0.023
▁older
0.014
▁recurrent
-0.136
▁ne
-0.033
ural
-0.246
▁network
1.029
▁models
-0.0
inputs
0.399
▁Trans
0.529
former
-0.203
s
0.05
▁have
0.128
▁rapidly
0.104 / 2
▁become▁the
5.148
▁model
1.552
▁of
0.23
▁choice
0.24
▁for
-0.019 / 2
▁NLP
-0.76
▁problems
0.027
,
-0.015
▁replacing
0.023
▁older
0.014
▁recurrent
-0.136
▁ne
-0.033
ural
-0.246
▁network
1.029
▁models
-0.0
inputs
0.043
▁Trans
0.055
former
0.023
s
0.061
▁have
0.099
▁rapidly
0.256 / 2
▁become▁the
0.342
▁model
1.852
▁of
0.52
▁choice
0.231
▁for
-0.0 / 2
▁NLP
0.063
▁problems
-0.044
,
-0.014
▁replacing
0.005
▁older
-0.061
▁recurrent
-0.021
▁ne
-0.047
ural
0.147
▁network
-0.008
▁models
-0.0
inputs
0.043
▁Trans
0.055
former
0.023
s
0.061
▁have
0.099
▁rapidly
0.256 / 2
▁become▁the
0.342
▁model
1.852
▁of
0.52
▁choice
0.231
▁for
-0.0 / 2
▁NLP
0.063
▁problems
-0.044
,
-0.014
▁replacing
0.005
▁older
-0.061
▁recurrent
-0.021
▁ne
-0.047
ural
0.147
▁network
-0.008
▁models
-0.0
inputs
-0.289
▁Trans
-0.081
former
0.09
s
-0.033
▁have
0.006
▁rapidly
-0.109 / 2
▁become▁the
0.826
▁model
1.703
▁of
5.048
▁choice
1.871
▁for
-0.05 / 2
▁NLP
0.297
▁problems
0.048
,
0.018
▁replacing
0.084
▁older
0.051
▁recurrent
-0.115
▁ne
0.32
ural
-0.616
▁network
-0.312
▁models
-0.0
inputs
-0.289
▁Trans
-0.081
former
0.09
s
-0.033
▁have
0.006
▁rapidly
-0.109 / 2
▁become▁the
0.826
▁model
1.703
▁of
5.048
▁choice
1.871
▁for
-0.05 / 2
▁NLP
0.297
▁problems
0.048
,
0.018
▁replacing
0.084
▁older
0.051
▁recurrent
-0.115
▁ne
0.32
ural
-0.616
▁network
-0.312
▁models
-0.0
inputs
0.178
▁Trans
0.105
former
0.106
s
0.084
▁have
0.103
▁rapidly
0.228 / 2
▁become▁the
0.18
▁model
0.483
▁of
0.754
▁choice
1.923
▁for
0.078 / 2
▁NLP
0.171
▁problems
0.001
,
-0.024
▁replacing
0.073
▁older
0.036
▁recurrent
0.045
▁ne
0.119
ural
0.069
▁network
-0.13
▁models
0.0
inputs
0.178
▁Trans
0.105
former
0.106
s
0.084
▁have
0.103
▁rapidly
0.228 / 2
▁become▁the
0.18
▁model
0.483
▁of
0.754
▁choice
1.923
▁for
0.078 / 2
▁NLP
0.171
▁problems
0.001
,
-0.024
▁replacing
0.073
▁older
0.036
▁recurrent
0.045
▁ne
0.119
ural
0.069
▁network
-0.13
▁models
0.0
inputs
0.029
▁Trans
0.039
former
-0.016
s
0.019
▁have
0.049
▁rapidly
-0.081 / 2
▁become▁the
-0.13
▁model
-0.149
▁of
0.267
▁choice
0.28
▁for
1.29 / 2
▁NLP
0.908
▁problems
-0.047
,
-0.017
▁replacing
0.007
▁older
0.084
▁recurrent
0.006
▁ne
0.115
ural
0.174
▁network
-0.017
▁models
0.0
inputs
0.029
▁Trans
0.039
former
-0.016
s
0.019
▁have
0.049
▁rapidly
-0.081 / 2
▁become▁the
-0.13
▁model
-0.149
▁of
0.267
▁choice
0.28
▁for
1.29 / 2
▁NLP
0.908
▁problems
-0.047
,
-0.017
▁replacing
0.007
▁older
0.084
▁recurrent
0.006
▁ne
0.115
ural
0.174
▁network
-0.017
▁models
0.0
inputs
-0.014
▁Trans
0.288
former
-0.042
s
-0.035
▁have
-0.017
▁rapidly
-0.222 / 2
▁become▁the
0.423
▁model
0.159
▁of
0.991
▁choice
1.207
▁for
-0.417 / 2
▁NLP
6.502
▁problems
-0.012
,
0.024
▁replacing
0.036
▁older
0.207
▁recurrent
0.135
▁ne
0.165
ural
0.212
▁network
-0.171
▁models
0.0
inputs
-0.014
▁Trans
0.288
former
-0.042
s
-0.035
▁have
-0.017
▁rapidly
-0.222 / 2
▁become▁the
0.423
▁model
0.159
▁of
0.991
▁choice
1.207
▁for
-0.417 / 2
▁NLP
6.502
▁problems
-0.012
,
0.024
▁replacing
0.036
▁older
0.207
▁recurrent
0.135
▁ne
0.165
ural
0.212
▁network
-0.171
▁models
0.0
inputs
-0.135
▁Trans
-0.282
former
-0.061
s
-0.131
▁have
-0.119
▁rapidly
-0.16 / 2
▁become▁the
0.071
▁model
0.086
▁of
0.044
▁choice
0.093
▁for
1.575 / 2
▁NLP
-0.446
▁problems
-0.065
,
-0.083
▁replacing
0.008
▁older
0.092
▁recurrent
0.074
▁ne
0.1
ural
0.151
▁network
0.09
▁models
-0.0
inputs
-0.135
▁Trans
-0.282
former
-0.061
s
-0.131
▁have
-0.119
▁rapidly
-0.16 / 2
▁become▁the
0.071
▁model
0.086
▁of
0.044
▁choice
0.093
▁for
1.575 / 2
▁NLP
-0.446
▁problems
-0.065
,
-0.083
▁replacing
0.008
▁older
0.092
▁recurrent
0.074
▁ne
0.1
ural
0.151
▁network
0.09
▁models
-0.0
inputs
0.019
▁Trans
0.07
former
-0.019
s
0.07
▁have
0.086
▁rapidly
0.017 / 2
▁become▁the
0.049
▁model
-0.209
▁of
0.495
▁choice
0.261
▁for
5.606 / 2
▁NLP
1.711
▁problems
0.116
,
-0.002
▁replacing
-0.007
▁older
-0.049
▁recurrent
0.622
▁ne
0.196
ural
0.289
▁network
-0.001
▁models
0.0
inputs
0.019
▁Trans
0.07
former
-0.019
s
0.07
▁have
0.086
▁rapidly
0.017 / 2
▁become▁the
0.049
▁model
-0.209
▁of
0.495
▁choice
0.261
▁for
5.606 / 2
▁NLP
1.711
▁problems
0.116
,
-0.002
▁replacing
-0.007
▁older
-0.049
▁recurrent
0.622
▁ne
0.196
ural
0.289
▁network
-0.001
▁models
0.0
inputs
-0.105
▁Trans
-0.089
former
-0.115
s
-0.075
▁have
-0.079
▁rapidly
-0.2 / 2
▁become▁the
-0.012
▁model
-0.039
▁of
0.014
▁choice
0.067
▁for
12.045 / 2
▁NLP
0.028
▁problems
0.12
,
-0.041
▁replacing
0.009
▁older
0.021
▁recurrent
-0.345
▁ne
-0.186
ural
-0.249
▁network
-0.289
▁models
-0.0
inputs
-0.105
▁Trans
-0.089
former
-0.115
s
-0.075
▁have
-0.079
▁rapidly
-0.2 / 2
▁become▁the
-0.012
▁model
-0.039
▁of
0.014
▁choice
0.067
▁for
12.045 / 2
▁NLP
0.028
▁problems
0.12
,
-0.041
▁replacing
0.009
▁older
0.021
▁recurrent
-0.345
▁ne
-0.186
ural
-0.249
▁network
-0.289
▁models
-0.0
inputs
0.115
▁Trans
0.051
former
0.106
s
0.134
▁have
0.131
▁rapidly
0.254 / 2
▁become▁the
0.252
▁model
0.211
▁of
0.286
▁choice
0.267
▁for
0.295 / 2
▁NLP
0.395
▁problems
2.342
,
0.046
▁replacing
0.047
▁older
0.026
▁recurrent
-0.024
▁ne
-0.02
ural
0.002
▁network
0.01
▁models
0.0
inputs
0.115
▁Trans
0.051
former
0.106
s
0.134
▁have
0.131
▁rapidly
0.254 / 2
▁become▁the
0.252
▁model
0.211
▁of
0.286
▁choice
0.267
▁for
0.295 / 2
▁NLP
0.395
▁problems
2.342
,
0.046
▁replacing
0.047
▁older
0.026
▁recurrent
-0.024
▁ne
-0.02
ural
0.002
▁network
0.01
▁models
0.0
inputs
0.077
▁Trans
0.064
former
-0.121
s
0.05
▁have
0.056
▁rapidly
0.461 / 2
▁become▁the
-0.097
▁model
-0.117
▁of
0.171
▁choice
0.053
▁for
0.617 / 2
▁NLP
0.394
▁problems
1.075
,
7.315
▁replacing
-0.007
▁older
0.187
▁recurrent
0.127
▁ne
0.005
ural
-0.073
▁network
0.399
▁models
0.0
inputs
0.077
▁Trans
0.064
former
-0.121
s
0.05
▁have
0.056
▁rapidly
0.461 / 2
▁become▁the
-0.097
▁model
-0.117
▁of
0.171
▁choice
0.053
▁for
0.617 / 2
▁NLP
0.394
▁problems
1.075
,
7.315
▁replacing
-0.007
▁older
0.187
▁recurrent
0.127
▁ne
0.005
ural
-0.073
▁network
0.399
▁models
0.0
inputs
-0.096
▁Trans
-0.141
former
0.058
s
-0.012
▁have
-0.003
▁rapidly
0.065 / 2
▁become▁the
0.087
▁model
0.064
▁of
0.048
▁choice
-0.003
▁for
0.118 / 2
▁NLP
0.171
▁problems
0.181
,
0.381
▁replacing
0.875
▁older
-0.038
▁recurrent
-0.009
▁ne
0.156
ural
-0.119
▁network
0.816
▁models
-0.0
inputs
-0.096
▁Trans
-0.141
former
0.058
s
-0.012
▁have
-0.003
▁rapidly
0.065 / 2
▁become▁the
0.087
▁model
0.064
▁of
0.048
▁choice
-0.003
▁for
0.118 / 2
▁NLP
0.171
▁problems
0.181
,
0.381
▁replacing
0.875
▁older
-0.038
▁recurrent
-0.009
▁ne
0.156
ural
-0.119
▁network
0.816
▁models
-0.0
inputs
-0.515
▁Trans
0.816
former
-0.072
s
-0.101
▁have
-0.079
▁rapidly
-0.006 / 2
▁become▁the
0.036
▁model
0.009
▁of
0.007
▁choice
-0.002
▁for
0.075 / 2
▁NLP
0.206
▁problems
0.06
,
1.8
▁replacing
5.123
▁older
-0.095
▁recurrent
-0.42
▁ne
-0.549
ural
0.315
▁network
0.695
▁models
0.0
inputs
-0.515
▁Trans
0.816
former
-0.072
s
-0.101
▁have
-0.079
▁rapidly
-0.006 / 2
▁become▁the
0.036
▁model
0.009
▁of
0.007
▁choice
-0.002
▁for
0.075 / 2
▁NLP
0.206
▁problems
0.06
,
1.8
▁replacing
5.123
▁older
-0.095
▁recurrent
-0.42
▁ne
-0.549
ural
0.315
▁network
0.695
▁models
0.0
inputs
0.136
▁Trans
-0.031
former
-0.011
s
-0.176
▁have
-0.175
▁rapidly
-0.512 / 2
▁become▁the
0.349
▁model
0.142
▁of
0.059
▁choice
0.145
▁for
0.366 / 2
▁NLP
-0.093
▁problems
0.308
,
0.346
▁replacing
-0.076
▁older
0.576
▁recurrent
-0.248
▁ne
-0.201
ural
-0.246
▁network
7.662
▁models
0.0
inputs
0.136
▁Trans
-0.031
former
-0.011
s
-0.176
▁have
-0.175
▁rapidly
-0.512 / 2
▁become▁the
0.349
▁model
0.142
▁of
0.059
▁choice
0.145
▁for
0.366 / 2
▁NLP
-0.093
▁problems
0.308
,
0.346
▁replacing
-0.076
▁older
0.576
▁recurrent
-0.248
▁ne
-0.201
ural
-0.246
▁network
7.662
▁models
0.0
inputs
-0.102
▁Trans
-0.178
former
-0.044
s
-0.037
▁have
-0.038
▁rapidly
-0.118 / 2
▁become▁the
0.008
▁model
0.086
▁of
-0.044
▁choice
0.01
▁for
0.07 / 2
▁NLP
0.115
▁problems
0.047
,
-0.151
▁replacing
-0.115
▁older
-0.403
▁recurrent
-0.03
▁ne
-0.036
ural
1.945
▁network
0.65
▁models
-0.0
inputs
-0.102
▁Trans
-0.178
former
-0.044
s
-0.037
▁have
-0.038
▁rapidly
-0.118 / 2
▁become▁the
0.008
▁model
0.086
▁of
-0.044
▁choice
0.01
▁for
0.07 / 2
▁NLP
0.115
▁problems
0.047
,
-0.151
▁replacing
-0.115
▁older
-0.403
▁recurrent
-0.03
▁ne
-0.036
ural
1.945
▁network
0.65
▁models
-0.0
inputs
0.319
▁Trans
-0.188
former
-0.122
s
-0.072
▁have
-0.055
▁rapidly
-0.236 / 2
▁become▁the
-0.03
▁model
-0.067
▁of
-0.088
▁choice
-0.038
▁for
0.238 / 2
▁NLP
0.181
▁problems
0.067
,
-0.326
▁replacing
-0.316
▁older
-0.304
▁recurrent
-0.199
▁ne
-0.103
ural
6.655
▁network
0.994
▁models
0.0
inputs
0.319
▁Trans
-0.188
former
-0.122
s
-0.072
▁have
-0.055
▁rapidly
-0.236 / 2
▁become▁the
-0.03
▁model
-0.067
▁of
-0.088
▁choice
-0.038
▁for
0.238 / 2
▁NLP
0.181
▁problems
0.067
,
-0.326
▁replacing
-0.316
▁older
-0.304
▁recurrent
-0.199
▁ne
-0.103
ural
6.655
▁network
0.994
▁models
0.0
inputs
-0.193
▁Trans
0.107
former
-0.093
s
-0.031
▁have
-0.026
▁rapidly
-0.164 / 2
▁become▁the
0.008
▁model
0.005
▁of
0.018
▁choice
0.07
▁for
0.327 / 2
▁NLP
0.155
▁problems
0.132
,
0.319
▁replacing
0.593
▁older
1.052
▁recurrent
3.827
▁ne
5.713
ural
-0.619
▁network
0.134
▁models
-0.0
inputs
-0.193
▁Trans
0.107
former
-0.093
s
-0.031
▁have
-0.026
▁rapidly
-0.164 / 2
▁become▁the
0.008
▁model
0.005
▁of
0.018
▁choice
0.07
▁for
0.327 / 2
▁NLP
0.155
▁problems
0.132
,
0.319
▁replacing
0.593
▁older
1.052
▁recurrent
3.827
▁ne
5.713
ural
-0.619
▁network
0.134
▁models
-0.0
inputs
0.028
▁Trans
0.018
former
-0.135
s
-0.058
▁have
-0.056
▁rapidly
-0.29 / 2
▁become▁the
-0.018
▁model
-0.072
▁of
0.05
▁choice
0.031
▁for
-0.022 / 2
▁NLP
0.08
▁problems
0.004
,
-0.209
▁replacing
-0.088
▁older
0.139
▁recurrent
1.144
▁ne
3.272
ural
-0.052
▁network
-0.004
▁models
-0.0
inputs
0.028
▁Trans
0.018
former
-0.135
s
-0.058
▁have
-0.056
▁rapidly
-0.29 / 2
▁become▁the
-0.018
▁model
-0.072
▁of
0.05
▁choice
0.031
▁for
-0.022 / 2
▁NLP
0.08
▁problems
0.004
,
-0.209
▁replacing
-0.088
▁older
0.139
▁recurrent
1.144
▁ne
3.272
ural
-0.052
▁network
-0.004
▁models
-0.0
inputs
-0.021
▁Trans
-0.172
former
-0.003
s
-0.032
▁have
-0.019
▁rapidly
-0.042 / 2
▁become▁the
0.003
▁model
-0.017
▁of
0.026
▁choice
-0.005
▁for
0.106 / 2
▁NLP
0.057
▁problems
0.05
,
1.109
▁replacing
2.527
▁older
6.576
▁recurrent
0.588
▁ne
0.39
ural
0.432
▁network
0.171
▁models
0.0
inputs
-0.021
▁Trans
-0.172
former
-0.003
s
-0.032
▁have
-0.019
▁rapidly
-0.042 / 2
▁become▁the
0.003
▁model
-0.017
▁of
0.026
▁choice
-0.005
▁for
0.106 / 2
▁NLP
0.057
▁problems
0.05
,
1.109
▁replacing
2.527
▁older
6.576
▁recurrent
0.588
▁ne
0.39
ural
0.432
▁network
0.171
▁models
0.0
inputs
-0.017
▁Trans
0.001
former
-0.026
s
-0.009
▁have
-0.005
▁rapidly
-0.025 / 2
▁become▁the
0.002
▁model
-0.001
▁of
0.005
▁choice
0.006
▁for
0.024 / 2
▁NLP
0.019
▁problems
0.018
,
-0.023
▁replacing
-0.005
▁older
0.056
▁recurrent
-0.0
▁ne
0.049
ural
0.1
▁network
0.048
▁models
-0.0
inputs
-0.017
▁Trans
0.001
former
-0.026
s
-0.009
▁have
-0.005
▁rapidly
-0.025 / 2
▁become▁the
0.002
▁model
-0.001
▁of
0.005
▁choice
0.006
▁for
0.024 / 2
▁NLP
0.019
▁problems
0.018
,
-0.023
▁replacing
-0.005
▁older
0.056
▁recurrent
-0.0
▁ne
0.049
ural
0.1
▁network
0.048
▁models
-0.0
Have an idea for more helpful examples? Pull requests that add to this documentation notebook are encouraged!