Vim, LaTeX, awk and Gnuplot

This document is just a draft, and I don't know how likely it is that I'll ever finish it. I find it hugely unlikely that anyone will ever find any of it useful, but if you do, or if you see any errors, feel free to tell me so at

The Vim-LaTeX suite

I used Vim to edit my thesis, which was in LaTeX. The Vim-LaTeX suite makes that a lot easier. After installing the suite, go check out the documentation: typing
:h suite
will give you an amazing amount of information.

Not only does it have syntax folding and highlighting, it knows how to direct the LaTeX compiler, even doing multiple passes when BibTeX is involved. It can also be made to understand what to do when a Makefile is present. A good tip if you don't want to use a Makefile, but have your chapters in separate .tex files, is to create a file called MainName.latexmain in your project directory. Then even if you're editing ChapWetChem.tex, the LaTeX-suite knows to compile MainName.tex.

Oh, and never call your chapters "chapter1.tex" etc. Who knows when you'll want to add one in? This also makes it easier to insert LaTeX references. It's much easier to remember to type explained in chapter~\ref{ChapMaths}, $2+2=5$ for large values of 2.
than remembering that what used to be Chapter2.tex is now Chapter3.tex, and it's much easier to read, too.

A really nifty trick is to enable the "-src-specials' option in your texrc file (I do this in my user directory: /home/david/.vim/ftplugin/tex/texrc). This allows you to go to a place in your file and type \ls, and you are taken to the dvi, at the place where you were in your source. This is already much cooler than the cat's whiskers, but it gets better. Also enable the g:Tex_UseEditorSettingInDVIViewer option, and you can Ctrl-click in your dvi window, and you will be taken to the place in your code that corresponds to that ouput element (be it text, graphics or even a section, if you click on the header line)! The Vim-LaTeX suite also understands BibTeX quite nicely, so if you have a .bib file set up for your project that is referenced in your file, the master file for the project, or any file you're including from your current file, you're set to do F9-completion: Type \cite{bla[F9] and you will get a window with a list of all your citations whose keys start with "bla".

I made all my figures both as .ps and as .pdf files. The reason for this apparent silliness is that a dvi compile is quicker than making a pdf, but good old pdflatex doesn't insert postscript pictures. It can put in bitmaps of all weird and wonderful types, which normal LaTeX can't do, but ps is a mystery to it.

I started out with \ifpdf commands in each figure, but soon realized that there is an easier way: If each figure is in both forms, one with a .pdf extension, and one with .ps, all you have to do is:

and LaTeX will use the ps, and pdflatex will use the pdf.


I used gnuplot's curve fitting quite a lot. One thing that you won't find there is the infamous r-squared, or Coefficient of determination. This isn't much of a problem. If you really need an R2 value, and don't want to use the more technical goodness of fit tools that gnuplot gives you, there are plenty of tools that will do that for you.

If you've ever seen documents with pasted-in bitmaps of graphs, you'll appreciate the quality that vector graphics can give to your graphs. the postscript terminal on gnuplot is quite nice for this. however, I've found that latex has poor support for rotated ps files, so I ended up doing all my graphs in "portrait" orientation, and cropping them by hand.

My .gpi files had the following commands:
set term postscript enhanced portrait 10
set output ''
set size ratio 0.7
Then I would end up with a .ps file with a graph at the bottom of the page. This could be fixed by changing the line
%%BoundingBox: 50 50 554 770
to fit the graph, in this case to
%%BoundingBox: 50 50 554 400
The new coordinates are easy to find with gv, which continuously reports the coordinates of the mouse. This trick is also nice if Gnuplot crops a title or label (this sometimes happens when you're using special characters).

In the case of the pdf file, you'll find the cooordinates after the word "MediaBox". However, after fixing the file, if you changed the number of characters in the MediaBox statement, you'll have to run the file through pdftk:
pdftk broken.pdf output fixed.pdf
or all the programs that have to work with the pdf after your hack will moan and complain.

Chromquest and awk

Thermo Scientific make a range of really good value-for-money HPLC instruments, and in general, the ThermoQuest software is quite user-friendly. I used version 2.51, which is apparently horrendously out of date, and I found it to be reasonably feature-rich, if slightly buggy. The one thing that I could not make the software do was to export 3D data from the diode-array-detector. But no fear, brute force will always get you there. I grabbed the .dat file that the chromatography data was dumped into, and did the following:
hexdump -C chrom.dat > chrom.hexdump
vim chrom.hexdump
Now I went about halfway down the file, because I figured the 3D data would take up most of the file's bulk. I then searched backwards for the first occurrence of a duplicate line, which hexdump gives as an asterisk:
and what I found was:
00100640  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
001007a0  00 00 00 00 c4 fe ff ff  66 00 00 00 31 01 00 00  |....f...1...|
001007b0  df fe ff ff a4 00 00 00  be 01 00 00 ea ff ff ff  |......|
001007c0  8e ff ff ff 07 00 00 00  b5 fe ff ff 76 00 00 00  |.....v...|
Of course, you might be unlucky, and have a duplicate line halfway through your data. To check for this, you can look upwards through your data a bit. In my files (YMMV), I found a long section of data before my 3D block that just had increasing numbers, like this:
001005a0  e9 0e 00 00 ea 0e 00 00  eb 0e 00 00 ec 0e 00 00  |............|
001005b0  ed 0e 00 00 ee 0e 00 00  ef 0e 00 00 f0 0e 00 00  |............|
001005c0  f1 0e 00 00 f2 0e 00 00  f3 0e 00 00 f4 0e 00 00  |............|
001005d0  f5 0e 00 00 f6 0e 00 00  f7 0e 00 00 f8 0e 00 00  |............|
After some fiddling around, I discovered that the data was arranged as four-byte numbers, with the most significant bytes last. For example, the number "c4 fe ff ff" above is actually 0xFFFFFEC4, or -315. Likewise, at the end of your data, you will see some text, for example:
001d5e00  32 32 30 20 6e 6d 00 00  80 3f 03 6d 41 55 6f 12  |220 nm...?.mAUo.|
001d5e10  83 3a 00 00 00 80 3f 00  00 16 45 00 00 00 00 01  |.:....?...E.....|
001d5e20  00 00 00 17 00 00 00 00  00 80 3f 01 00 00 00 00  |..........?.....|
001d5e30  00 00 00 00 01 00 00 00  01 00 00 00 01 00 00 00  |................|
001d5e40  01 00 00 00 01 00 00 00  01 00 00 00 01 00 00 00  |................|
001d5e50  01 00 00 00 01 00 ff ff  00 00 0d 00 43 44 65 74  |..........CDet|
001d5e60  54 72 61 63 65 49 6e 66  6f 01 00 00 00 05 00 00  |TraceInfo.......|
So now there is the task of changing the data into a 3d chromatogram. Note the line numbers of the start and end of your 3d data in the hexdump. In my case, 9495 and 64119. Now we fire up awk:
awk '{
 if (9494 < FNR && FNR < 64120) {
  print "0x"$5 $4 $3 $2
  "\n0x"$9 $8 $7 $6
  "\n0x"$13 $12 $11 $10
  "\n0x"$17 $16 $15 $14
 }' < chrom.hexdump | 
 awk --non-decimal-data '/0xf/{
  print NR " " $0 - 0xffffffff}
  /0x[^f]/{print NR " " $0 + 0}' > chrom.3d.num
This gives you the 3D data in one line-numbered column. Just the way gnuplot likes it for one-dimensional display. Now you can open gnuplot and type "plot chrom.3d.num with lines" You will see your chromatogram with a strange, filled-in look. now zoom in on a nice large peak, and you will see your chromatogram as a series of UV spectra! Now you have to figure out how many points there are per spectrum. Zoom in nice and big, and double-click on the first point of one spectrum. this has just copied the coordinates of that point into the clipboard. Now go paste into vim or wherever. Next, double-click on the last point of that spectrum and paste that.

If you zoomed in far enough, you should get the line numbers quite precisely, for example 116573.1 and 116663.0. Now you have a data point (in the sense of a 3D chromatogram) starting at 116573 and ending at 116663 (for example). Now subtract those two numbers to get the width of your chromatogram in data points, in this case getting 90.

Now we can fire up awk again:
awk '{for (i=0; i<90; i++) {
 printf "%d ",$2; getline
print ""}' <chrom.3d.num > chrom.3d.matrix
And now the gnuplot command
splot 'chrom.3d.matrix' matrix w lines palette
gives a surface plot of the 3d chromatogram. If your chromatogram looks like the low wavelengths have been lopped off and stuck at the end of the high wavelengths, you have some junk at the start of chrom.3d.num to remove.