base version.
trained with a variable training rate of:
0.05:10, 0.02:20, 0.01:60, 0.005:200, 0.002:500, 0.001:3000, 0.0005
which overcooked the embedding, so training was resumed from step 500 at a lower training rate, and run until 15,000 steps.
base version.
trained with a variable training rate of:
0.05:10, 0.02:20, 0.01:60, 0.005:200, 0.002:500, 0.001:3000, 0.0005
which overcooked the embedding, so training was resumed from step 500 at a lower training rate, and run until 15,000 steps.