New Inputs and Outputs
October 16, 2023
This new version of Kojak has some cool new features that we will continue to develop in the coming months. Among them is support for the .mzMLb format. The format has been described for some time, but efforts to enable widespread distribution (via ProteoWizard) are underway. If you’re not familiar with it, it is an HDF5-structured format that preserves the mzML data elements while providing superior data compression. Initial tests show it capable of storing data in file sizes much smaller than .mzML, and even vendor formats. There might be some bugs to iron out, but if you have .mzMLb files, you can give them a try now. Both the Linux and Windows versions in the Downloads section were compiled to support .mzMLb. This may have some unintendend consequences for Linux users regarding a few shared libraries. If you compile your own Kojak from source and are having difficulties with the supporting library packages, please let me know and I can probably help get it sorted.
For the output, you can optionally export split PepXML files using the split_pepxml parameter. This adds a few extra files to your output, in addition to your PepXML file. These new files are divided by Single, Loop, and XL peptide spectrum matches, and each file provides the best PSM of that type for each spectrum. That means that a spectrum can in fact have three best matches: a single peptide, a loop-link, or a pair of cross-linked peptides. When validated in PeptideProphet, each classifier now has enough spectra to better model and determine probabilities for each result. Then iProphet can be used to merge the datasets to determine which of the PSMs for any given spectra are kept moving forward. We are still working on the development of this approach and will post some tutorials in the future.
Lastly, an obscure bug in the Hardklor library built into Kojak might have caused infinite loops when loading data. This has been corrected.