New Inputs and Outputs

October 16, 2023

This new version of Kojak has some cool new features that we will continue to develop in the coming months. Among them is support for the .mzMLb format. The format has been described for some time, but efforts to enable widespread distribution (via ProteoWizard) are underway. If you’re not familiar with it, it is an HDF5-structured format that preserves the mzML data elements while providing superior data compression. Initial tests show it capable of storing data in file sizes much smaller than .mzML, and even vendor formats. There might be some bugs to iron out, but if you have .mzMLb files, you can give them a try now. Both the Linux and Windows versions in the Downloads section were compiled to support .mzMLb. This may have some unintendend consequences for Linux users regarding a few shared libraries. If you compile your own Kojak from source and are having difficulties with the supporting library packages, please let me know and I can probably help get it sorted.

For the output, you can optionally export split PepXML files using the split_pepxml parameter. This adds a few extra files to your output, in addition to your PepXML file. These new files are divided by Single, Loop, and XL peptide spectrum matches, and each file provides the best PSM of that type for each spectrum. That means that a spectrum can in fact have three best matches: a single peptide, a loop-link, or a pair of cross-linked peptides. When validated in PeptideProphet, each classifier now has enough spectra to better model and determine probabilities for each result. Then iProphet can be used to merge the datasets to determine which of the PSMs for any given spectra are kept moving forward. We are still working on the development of this approach and will post some tutorials in the future.

Lastly, an obscure bug in the Hardklor library built into Kojak might have caused infinite loops when loading data. This has been corrected.


New Release is Alpha 22

August 19, 2022

There were numerous small updates since the last release, but I will highlight only a few. First, cleavable crosslinkers are now better supported with the xl_cleavage_product_mass parameter. Yes, it is yet another parameter for defining crosslinkers. To make everyone’s life easier, I’ve added a macro to the default configuration file that lets you select all the features for a given crosslinker with a single parameter, predefined_crosslink. If this parameter is defined, it will automatically set all the necessary parameters for a given crosslinker without having to define cross_link, mono_link, etc. And to help everyone keep up-to-date with the latest parameters, you can now get a default parameter set directly from the commandline by executing:

kojak --config

from your command line. This will create a kojak_default_params.conf file which will have all the latest parameters for the current version. Just rename it (otherwise the next time you run this command, it will be overwritten) and edit it for your analysis.

Other updates include fixes for Percolator support and reconfiguration of default and suggested parameters.


New Release is Alpha 16

April 11, 2022

This release has a critical fix for an intermittent crash in some data files. This release also contains minor speed improvements.


Alpha 13 Released

April 1, 2022

This update has two major overhauls. First, file loading is now multithreaded and pipelined to process the MS1 and MS2 spectra while they are being read from disk. This dramatically speeds up the spectral preprocessing stages of Kojak. It doesn’t affect the peptide search space, so the most noticeable speed improvements are with large files that have small search spaces.

The second major overhaul is to better organize the text output from Kojak. Previously PSMs were reported in the order in which the threads completed them. Now they are organized in the same order regardless of when computation time is completed. This includes sorting multiple PSMs within the same spectrum that have the same score (i.e. ties). This saves users from having to sort their results when repeating an analysis to look for differences.

As a minor note, the default/sample configuration files has been updated to reflect the latest parameter changes and generalized settings. Most noteworthy is the recommended bin size for orbitrap MS2 spectra is now 0.01.


Documentation Updates

January 26, 2022

A few new parameters have been added throughout Kojak 2.0 development. I’ve started updating the documentation to reflect these new parameters. aa_mass and results_path are now included in the documentation, and are very useful in both customizing the cross-linking analysis and organizing the results.