4. gffpandas API¶
4.1. gffpandas.gffpandas module¶
-
class
gffpandas.gffpandas.
Gff3DataFrame
(input_gff_file=None, input_df=None, input_header=None)[source]¶ Bases:
object
This class contains header information in the header attribute and a actual annotation data in the pandas dataframe in the df attribute.
-
attributes_to_columns
() → pandas.core.frame.DataFrame[source]¶ Saving each attribute-tag to a single column.
Attribute column will be split by the tags in the single columns. For this method only a pandas DataFrame and not a Gff3DataFrame will be returned. Therefore, this data frame can not be saved as gff3 file.
Returns: pandas dataframe, whereby the attribute column of the gff3 file are splitted into the different attribute tags Return type: pandas DataFrame
-
filter_by_length
(min_length=None, max_length=None) → gffpandas.gffpandas.Gff3DataFrame[source]¶ Filtering the pandas dataframe by the gene_length.
For this method the desired minimal and maximal bp length have to be given.
Parameters: Returns: original header and dataframe with features, whose lengths fits the set parameters, saved as object of the class Gff3DataFrame
Return type: class ‘gffpandas.gffpandas.Gff3DataFrame’
-
filter_feature_of_type
(feature_type_list) → gffpandas.gffpandas.Gff3DataFrame[source]¶ Filtering the pandas dataframe by feature_type.
For this method a list of feature-type(s) has to be given, as e.g. [‘CDS’, ‘ncRNA’].
Parameters: feature_type_list (list) – List of name(s) of the desired feature(s) Returns: original header and dataframe of the selected features saved as object of the class Gff3DataFrame Return type: class ‘gffpandas.gffpandas.Gff3DataFrame’
-
find_duplicated_entries
(seq_id=None, type=None) → gffpandas.gffpandas.Gff3DataFrame[source]¶ Find entries which are redundant.
For this method the chromosom accession number (seq_id) as well as the feature-type have to be given. Then all entries which are redundant according to start- and end-position as well as strand-type will be found.
Parameters: Returns: original header and dataframe containing the duplicated entries, both saved as object of the class Gff3DataFrame
Return type: class ‘gffpandas.gffpandas.Gff3DataFrame’
-
get_feature_by_attribute
(attr_tag, attr_value_list) → gffpandas.gffpandas.Gff3DataFrame[source]¶ Filtering the pandas dataframe by a attribute.
The 9th column of a gff3-file contains the list of feature attributes in a tag=value format. For this method the desired attribute tag as well as the corresponding value have to be given. If the value is not available an empty dataframe would be returned.
Parameters: Returns: original header and dataframe with the entries, which contain the desired attribute values, both saved as object of the class Gff3DataFrame
Return type: class ‘gffpandas.gffpandas.Gff3DataFrame’
-
overlaps_with
(seq_id=None, start=None, end=None, type=None, strand=None, complement=False) → gffpandas.gffpandas.Gff3DataFrame[source]¶ To see which entries overlap with a comparable feature.
For this method the chromosom accession number has to be given. The start and end bp position for the to comparable feature have to be given, as well as optional the feature-type of it and if it is on the sense (+) or antisense (-) strand.
Possible overlaps (see code):——–=================——————————–=====================——–——-===================———————-===================———————========================———–——============———————–———=====================———–——————============———–By selecting ‘complement=True’, all the feature, which do not overlap with the to comparable feature will be returned.
Parameters: Returns: original header and dataframe, containing the entries which overlap or do not overlap (complement=True) with the given parameters, both saved as object of the class Gff3DataFrame
Return type: class ‘gffpandas.gffpandas.Gff3DataFrame’
-
stats_dic
() → dict[source]¶ Gives the following statistics for the data:
The maximal bp-length, minimal bp-length, the count of sense (+) and antisense (-) strands as well as the count of each available feature.
Returns: information about the given dataframe, which are the length of the longest and shortest feature entry (in bp), the number of feature on the sense and antisense strand and the number of different feature types. Return type: dictionary
-
to_csv
(output_file=None) → None[source]¶ Create a csv file.
The pandas data frame is saved as a csv file.
Parameters: output_file (str) – Desired name of the output csv file Returns: csv file with the content of the dataframe Return type: data file in csv format
-