Bug 91714 - [ maintainer update ] update biology/ncbi-toolkit
Summary: [ maintainer update ] update biology/ncbi-toolkit
Status: Closed FIXED
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-ports-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-01-12 18:00 UTC by Fernan Aguero
Modified: 2006-01-14 19:34 UTC (History)
0 users

See Also:


Attachments
ncbi-toolkit.diff (10.21 KB, patch)
2006-01-12 18:00 UTC, Fernan Aguero
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Fernan Aguero 2006-01-12 18:00:16 UTC
	

	The included patch brings ncbi-toolkit to the version released in
	December, 2005. 

	This version contains BLAST 2.2.13, and the important changes
	are summarized below (also available from 
	http://www.ncbi.nlm.nih.gov/blast/blast_whatsnew.shtml#20051206)
	
Standalone BLAST 2.2.13 is now available from the BLAST download page. 

Major changes include:
New engine now available in blastall
Statistical parameter change
Bug fixes
New engine available in blastall

Blastall now has support for a new version of the BLAST engine that can be enabled by adding "-V F" to the blastall command-line. This option will probably be the default in future versions. There are a few situations where it is very advantageous to use the new engine: 
Large word-sizes with a BLASTN search. The new engine uses the "stride" idea of AGBLAST and this can lead to a considerable speedup for large wordsizes. For a run of a typical mRNA sequence (u00001) with a word size of 25 the new code runs about twice as fast as the old code. Note that the AG "stride" has been available in megablast since the 2.2.10 release. This enhancement is platform-independent. 
Searching multiple queries at once. The new engine will search multiple queries by scanning the database once, rather than once for each query. The speedup will depend upon the queries being searched and what part of the time is spent scanning the databases vs. actual compuations (e.g., extensions etc.). Typically this feature is most important if a number of short queries (e.g., mRNA's or EST's) are being searched with blastn or if a tblastn search is performed. This feature is partially supported in the old code with the -B option as well as by megablast. 
For very large queries. The memory management (especially during the dynamic programming phase) has been improved and this may allow searches with lots of matches or large queries that used to fail to now run to completion. 
Statistical parameter change 

Megablast, blastall and bl2seq have until now allowed users to select arbitrary gap existence and extension penalties for a blastn type search. This has been convenient for users but has led to the unfortunate situation that searches with some parameter sets were significantly overestimating the statistical significance of matches. To address this problem the proper statistical parameters for a number of reward/penalty/gap existence/gap extension values have been calculated. 

The parameters that might cause an issue here are -r (match reward), -q (mismatch penalty), -G (gap existence cost), and -E (gap extension cost). If you do not change these, then nothing will change for you. 

Please email blast-help@ncbi.nlm.nih.gov with any questions, bug reports, or requests for different parameter sets. 

Below are listed the supported combinations. Note that above a certain gap existence and extension penalty any value is permitted, as the statistics for ungapped searches can be used. These are marked as "ungapped threshold" below. 

For match = 1, mismatch = -4 the supported combinations are:

G  E
-----
1, 2,
0, 2,
2, 1,
1, 1,
2, 2 (ungapped threshold)


match = 2, mismatch = -7 the supported combinations are:

G  E
-----
2, 4,
0, 4,
4, 2,
2, 2,
4, 4 (ungapped threshold)

match = 1, mismatch = -3 the supported combinations are:

G  E
-----
1, 2,
0, 2,
2, 1,
1, 1
2, 2 (ungapped threshold)

match = 2, mismatch = -5 the supported combinations are:

G  E
-----
2, 4,
0, 4,
4, 2,
2, 2,
4, 4 (ungapped threshold)

match - 1, mismatch = -2 the supported combinations are:

G  E
-----
1, 2,
0, 2,
3, 1,
2, 1,
1, 1,
2, 2 (ungapped threshold)

match = 2, mismatch = -3 the supported combinations are:

G  E
-----
4, 4,
2, 4,
0, 4,
3, 3,
6, 2,
5, 2,
4, 2,
2, 2,
6, 4 (ungapped threshold)

match = 1, mismatch = -1 the supported combinations are:

G  E
-----
3, 2,
2, 2,
1, 2,
0, 2,
4, 1,
3, 1,
2, 1,
4, 2 (ungapped threshold)

match = 5, mismatch = -4 the supported combinations are:

G  E
-----
10, 6
8, 6
25, 10 (ungapped threshold)

match = 4, mismatch = -5 the supported combinations are:

G  E
-----
6, 5,
5, 5,
4, 5,
3, 5,
12, 8 (ungapped threshold)

Bug fixes
A bug has been fixed in formatdb. This bug occurred when the -o option was not used, meaning that the FASTA definition lines of the input file were not parsed, and multiple database volumes were generated. The bug normally did not become apparent to the user until the BLAST run at which point the BLAST binary (e.g., blastall) would produce messages containing "ObjMgrChoice: pointer [0] type [1] not found".
Comment 1 Pav Lucistnik freebsd_committer freebsd_triage 2006-01-14 19:34:32 UTC
State Changed
From-To: open->closed

Committed, thanks!