Package biana :: Package BianaObjects :: Module sequenceUtilities
[hide private]
[frames] | no frames]

Module sequenceUtilities

source code

BIANA: Biologic Interactions and Network Analysis Copyright (C) 2009 Javier Garcia-Garcia, Emre Guney, Baldo Oliva

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.

Classes [hide private]
  BlastallParserIterator
Functions [hide private]
 
get_clustalw_alignment(sequencesList)
Executes clustalw to get the multiple sequence alignment between the proteins in the list
source code
 
get_cd_hit_clusters(fasta_file, output_path='./', sequence_identity_threshold=0.95)
Executes CD-HIT with the sequences fasta file and saves the results in the ouput_path
source code
 
blast_cd_hit_clusters(cd_hit_clusters_file, output_fd, dbaccess, length_blast_db=None, effective_length_space_search=None)
Performs a blast between all the proteins belonging to the same cd-hit cluster
source code
 
blast_sequence(blastDatabase, sequenceObject, temporalOutputPath=None)
Does a blast of sequence sequenceObject against database "blastDatabase"
source code
 
_calculate_similarities(dbaccess, length_blast_db=None, effective_length_space_search=None, sequenceID_list=[], fd_output_file=sys.stdout, representant=None)
Calculates the similarity between all proteins in the list "cluster_sequences".
source code
 
self_blast_fasta_file(fasta_file, fd_output_file)
Does a blast within all sequences in a file
source code
 
parse_blastall_output(fd_blastall_output, temporalOutputFile_fd=None, return_only_ids=False, limit_to_sequenceIDs=Set([]))
"fd_blastall_output" is the output fd of the blast process (input for this method)
source code
 
parse_bl2seq_output(sequenceID_A, sequenceID_B, bl2seq_output=None, fd_output_file=None) source code
Function Details [hide private]

get_cd_hit_clusters(fasta_file, output_path='./', sequence_identity_threshold=0.95)

source code 

Executes CD-HIT with the sequences fasta file and saves the results in the ouput_path

# TODO: Automate for executing in cluster (now it is done manually)

blast_sequence(blastDatabase, sequenceObject, temporalOutputPath=None)

source code 

Does a blast of sequence sequenceObject against database "blastDatabase"

ATTENTION: "temporalOutputPath" is the temporal gzipped file where the blast results are stored. If it exists, blast is not calculated! If it is None, results are not saved.

_calculate_similarities(dbaccess, length_blast_db=None, effective_length_space_search=None, sequenceID_list=[], fd_output_file=sys.stdout, representant=None)

source code 

Calculates the similarity between all proteins in the list "cluster_sequences".

"cluster_sequences" must be a list of the sequenceIDs

It uses bl2seq or blastall to calculate them

Results are printed to "fd_output_file"

self_blast_fasta_file(fasta_file, fd_output_file)

source code 

Does a blast within all sequences in a file

If file is very big, it won't work. It is intended for small sets

parse_blastall_output(fd_blastall_output, temporalOutputFile_fd=None, return_only_ids=False, limit_to_sequenceIDs=Set([]))

source code 

"fd_blastall_output" is the output fd of the blast process (input for this method)

"temporalOutputFile" is a file where all the input of fd_blastall_output is saved

"return_only_ids" is used to store only ids, not complete blast results

"limit_to_sequenceIDs" is used to filter blast parsing to only those sequenceids