Memorias del 3er Congreso Argentino de Bioinformática y Biología

Transcripción

Memorias del 3er Congreso Argentino de Bioinformática y Biología
 Memorias del 3er Congreso Argentino de Bioinformática y Biología Computacional 26, 27 y 28 de septiembre de 2012 Facultad de Ingeniería – Universidad Nacional de Entre Ríos Oro Verde, Entre Ríos Organizado por: Auspician: 1
Organización
Comité cientı́fico
Dr. Ariel Amadı́o
UNER
Dr. Ariel Chernomoretz
UBA
Dr. Diego Ferreiro
UBA
Dr. Morten Nielsen
UnSam
Dr. Sergio Pantano
Institut Pasteur, ROU
Dr. Alfredo Quevedo
UNER
Dr. Maximo Rivarola
INTA
Dr. Gustavo Vazquez
UNS
Comité organizador
Mst. Bioing. Rubén Acevedo
Mst. Bioing. Gerardo Gentiletti
Dra. Fernanda Izaguirre
Dr. Vı́ctor Casco
Dr. Ariel Amadı́o
Bioing. Pedro Tomiozzo
Bioing. Yanina Atum
Bioing. Analı́a Chernı́z
Dr. Fernán Agüero
Dr. Diego Ferreiro
Dra. Cristina Marino Buslje
Dr. César Martı́nez
Bioing. Roberto Leonarduzzi
Bioing. Iván Gareis
2
Conferencias
Bioinformatics for data-driven biology
Dr. Mario Caccamo
Resumen de la conferencia. Obtaining the sequence of the three-billion bases of the human genome was
hailed as one of the biggest achievement in the history of science. It involved a remarkable organisation of
scientists, policy-makers and funding agencies from across the world. This feat has marked the beginning
of a new era in molecular biology characterised by a revolution in data generation. As in other areas of
science, technological advances combined with the availability of high-performance computers have made
possible to produce, process and collect biological data at a rate that have transformed life sciences. Today,
any laboratory equipped with the latest sequencing technologies can generate in only few days as much
sequence as the Human Genome Project did in 10 years and at a fraction of the cost. More excitingly these
new technologies have opened up the possibilities for new applications such as the promise of personalised medicine, the study of environmental samples and the ability to apply more effective crop breeding
methods. As sequencing becomes cheaper and more accurate we will soon be able to explore genetic information in real-time and at a single-cell level. This unprecedented wealth of data, however, has come
with new challenges. There is a growing gap between the capacity to generate genomic sequences and the
ability to process and interpret the resulting data. The sheer volume of information requires new levels of
software sophistication both to cope with the load and to analyse it effectively. If we are to realise the value
of data-intensive biology we cannot rely on existing methodologies. One example is the novel computer
hardware and software architectures that are emerging to cope with the demands of big data analyses such
as cloud computing platforms. Although these solutions can help to close the “Next-Generation Gap” in
molecular biology they still don’t provide the complexity needed to integrate and interpret data from multidisciplinary sources; let alone the ability to understand it. The breakthroughs, however, will be achieved
by training and educating the next generation of scientists and professionals in a new paradigm of science
driven by data. In this presentation I will explore the impact that this revolution has had in the use of
informatics in biology and how this transformation is only the begging for a very exciting future for life
sciences.
In silico characterization of intermolecular interactions in
biological systems
Dr. Claudio Cavasotto
Resumen de la conferencia. Today, computational simulation is an invaluable tool to study macromolecular association, enzymatic reactions, and to understand at a molecular level the relationship between structure, dynamics and function. Thus, it provides an efficient and insightful complement to experimental evaluation. At the core of these calculations lies the potential energy function, which describes the intermolecular interactions in the system. The latest developments of our research group will be presented, focusing on
the application of in silico methods to problems in the areas of structural biology, drug discovery, binding
free energy calculation, and cheminformatics, namely:
I
the ligandsteered homology modelling method, where the interaction of known ligands with the re3
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
ceptor is used to shape and optimize the binding site through a stochastic global energy minimization,
with the final goal of using the modelled structures in structure based drug discovery;
II
III
the discovery of novel modulators of GPCRs and nuclear receptors through coarse-grained high-throughput
docking followed by experimental evaluation;
the use of quantum mechanical (QM) methods to study biomacromolecular interaction; as a case study,
the QM calculation of absolute and relative binding free energy of tetra- phosphopeptides to the SH2
domain of human LCK will be presented and compared to the failure of classical methods. Current
limitations of computational methods and future trends will be also discussed.
Mining the Schistosoma genome for new drug targets
Dr. Guilherme Correa-Oliveira
Resumen de la conferencia. There are three important Schistosoma species parasitizing humans: Schistosoma mansoni, S. japonicum and S. haematobium. Together they chronically infect at least 200 million
people and more than 200,000 deaths are reported annually worldwide. Several efforts, including health
education, sanitation, intermediate host control, and chemotherapy treatments, are among the strategies
recommended by the WHO. However, in the recent years infection prevalence and in some regions the
intensity have not reduced. The drug of choice to treat schistosomiasis is oral praziquantel that has been
used for over 40 years. Although praziquantel is an efficacious drug with some limitations, in recent years
problems of resistance have arisen and alternatives don’t exist so far. To contribute to a solution, the important information of the recently sequenced genomes of these parasites was used to identify potential targets
for the development of an alternative drug. Advances in structural and functional genomics, proteomics,
genetics and molecular biology have substantially increased the amount of available data for schistosome
research. Making full use of this information requires computational resources and skills that may not
be promptly available for most researchers. Integration of the large volumes of different data types in a
user friendly and easily available manner is of major importance to the community and one of the objectives our our group. A database containing genomic, gene annotation and functional data, SchistoDB
(www.schistodb.net), was constructed using the GUS Schema. SchistoDB offers a variety of tools including
BLAST, protein motif searches, keyword searches of pre-computed BLAST results, Gene Ontology assignments, protein family information and microarray probes. We have also produced SchistoCyc, the complete metabolic pathways prediction produced using the PathwayTools software. SchistoDB includes a list
of drugs predicted to act on orthologues of S. mansoni according to KEGG DRUG and links exist to TDR
Targets. We aimed at developing targets against two groups of proteins: histone modifying enzymes and
protein kinases. Histone modifying enzymes (HMEs) play key roles in the regulation of chromatin modifications. Furthermore, aberrant epigenetic states are often associated with human diseases, leading to
great interest in HMEs as therapeutic targets. We have identified and characterized all enzymes involved
in acetylation and methylation modification, for instance: histone acetyltransferases (HATs), deacetylases
(HDACs), methyltranferases (HMTs) and demethylases (HDM). We analyzed the predicted proteomes of
the parasites in order to identify and classify the HMEs through computational approaches, mainly using HMM profiles. We were able to identify, in average, 60 HMEs with some variation within the three
Schistosoma species. From the identified enzymes, 24 were validated as therapeutic targets individually
using RNA interference in cultured larval stages (schistosomula) to invalidate the corresponding genes.
Although, gene knockdown of up to 90 % could be achieved, no phenotype could be observed after 7 days
of dsRNA exposure. Loss of motility could be observed as a phenotype for two HDMs after 30 days of
dsRNA exposure. In addition, in order to assess the role of genes in the presence of the host environment, under immunological pressure, knockdown parasites for four HMEs (HDAC8, KDM1/ KDM2 and
PRMT3) were tested in vivo. A significant reduction of worm burden (50 %) could be observed in mice infected with knockdown parasites for HDAC8 when compared to unspecific control. Finally, egg count was
significantly reduced in mice livers for all tested HMEs. In conclusion, our work improved the functional
annotation of over 20 % of S. mansoni HAT and HDAC proteins. Parasites with reduced levels of HDAC8,
4
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
KDM1/KDM2 and PRMT3, seem to diminish the oviposition and ability to survive (for HDAC8) in the host
milieu, indicating that these enzymes could be good target candidates for drug development. Since eukaryotic protein kinases (ePKs) are good chemical and medical targets for drug development and an increasing
number of ePK inhibitors have been approved for the treatment of different human disease, they have become the focus of this study. The ePKs were identified in S. mansoni, S. japonicum and S. haematobium
by HMM searches and classified by group, family and subfamily by phylogeny. Most selected ePKs were
activators/effectors of MAPK signaling pathway, and key pathway proteins were chosen for experimental
validation: SmRas, SmERK1, SmERK2, SmJNK, and SmCaMK2. RNAi was used to elucidate the functional
role of MAPKs in signaling pathways. Although transcription was reduced no phenotype was observed
in culture. Therefore, mice were infected with the silenced schistosomula and it was observed that SmJNK
has an important role in transformation and survival of the parasites as low number of adult worms was
recovered and the tegument of survived worms was damaged. Moreover, SmERK1/SmERK2 expression
was related to egg production, as mice infected with silenced schistosomula, displayed significantly lower
egg production and the recovered female worms had underdeveloped ovaries. Furthermore, it was showed
that the c-fos transcription factor was overexpressed in parasites with low expression of SmERK1, SmJNK
and SmCaMK2.
Two Universals in Genomics: Information Content and Specie’s
Abundance Diversity
Dr. Hernán Dopazo
Resumen de la conferencia. In this talk we analyse two hypotheses: H1- that there is a common combinatorial structure of DNA along all diversity of life, and H2- that a common rule governing species abundance
and diversity (SAD) exists in genomes. H1- Our first hypothesis is that there is a random-like structure of
DNA along all diversity of life. To test it, we define a complexity measure based on a classical method
used in data compression and applicable to arbitrarily large sequences introducing no fragmentation. The
method detects regularities due to repeats of any length, at any distance, and other structural correlations.
As the main result we report that the ratio of genome complexity to size remained almost maximal and
unchanged along six orders of magnitude in genome size, covering all biological diversity. We observe a
uniform complexity increases with genome size for phages, bacteria, unicellular eukaryotes, fungi, plants,
and animals. Major deviations from maximal genome complexity correspond to polyploid species. We formulate two general hypotheses:
almost maximal combinatorial structure of DNA sequence is a common characteristic of genomes
throughout biological diversity;
increases in the combinatorial complexity of DNA only occur by mechanisms of genome amplification, and subsequent accumulation of DNA sequence mutations, transpositions and/or deletions of
genetic material. Our hypothesis can be falsified if a single recent polyploid genome with a randomlike DNA structure is found; or if a non-polyploid genome shows a non- random DNA structure.
H2- Our second hypothesis is that there is a common rule governing species abundance and diversity
(SAD) in genomics. To what extent SAD reflects adaptive or stochastic outcomes? Ideal models for genomics would consider all diversity of elements populating eukaryote genomes. However, such model
does not exist. In ecology, the unified neutral theory of biodiversity (UNTB) assumes interactions among
tropically similar species equivalent on an individual “per capita” basis. UNTB assumes that these individuals, regardless of the species, appear to be controlled by similar birth, death, dispersal, and speciation
rates. Biodiversity composition therefore emerges randomly in the community. Here, taking advantage of
the UNTB and the general framework posed by ecological genomics we ask for the relative SAD of genetic elements of 500 chromosomes in 30 eukaryote genomes. After ML adjustment of UNTB parameters
and hypothesis testing we found that most chromosomes follow relative SAD according to the expected by
UNTB. While ecologists found natural selection an irrelevant component to explain relative SAD in forests,
we found that the same simple neutral model fits SAD of genetic elements in genomes. We suggest that the
random-like structure and the observed SAD are universals in genomes along all diversity of life.
5
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Predicting metabolic activity and crowd-sourcing promoter
strength analysis
Dr. Pablo Meyer-Rojas
Resumen de la conferencia. Although much is understood about the enzymatic cascades that underlie
cellular biosynthesis, comparatively little is known about their cellular organization. We here show via
a detailed analysis of the localization of fluorescently tagged enzymes in bacteria that biochemical reactions inside the cytoplasm are organized spatially following a rule where the first or the last enzymes are
localized and this localization is determined by the activity state of the cell’s metabolic network. In the
framework of the DREAM6 (Dialogue for Reverse Engineering Assessments and Methods), a community
effort to evaluate the status of the methodology for systems biology modeling, we presented a challenge
where participants had to predict gene promoter expression from an experimentally generated data set
previously unknown. Twenty-one teams submitted results predicting the expression levels of 53 different
promoters from ribosomal protein genes of yeast S. cerevisiae. We here present the analysis of participants
predictions, providing a benchmark for assessment of methods predicting promoter activity.
Data mining in bioinformatics: an integrated approach based
on computational intelligence
Dr. Diego Milone
Resumen de la conferencia. Biology is in the middle of a data explosion. The technical advances achieved
by the genomics, metabolomics, transcriptomics and proteomics technologies in recent years have significantly increased the amount of data that biologists can measure and analyze about different aspects of an
organism. Besides, ∗omics data sets have several additional problems: they have inherent biological complexity and may have significant amounts of noise as well as measurement artifacts. The need to extract
information from such databases is again considered a challenge. This requires novel computational techniques and models to automatically perform data mining tasks such as integration of different data types,
clustering and knowledge discovery, among others. This presentation is about a novel integrated computational intelligence approach for biological data mining that involves neural networks and evolutionary
computation. We propose the use of self-organizing maps for the identification of coordinated patterns variations; a new training algorithm that can include a priori biological information to obtain more biological
meaningful clusters; a validation measure that can assess the biological significance of the clusters found;
and finally, an evolutionary algorithm for the inference of unknown metabolic pathways involving the selected clusters. We suggest that the random-like structure and the observed SAD are universals in genomes
along all diversity of life.
6
7
19:00 - 19:30
18:30 - 19:00
18:00 - 18:30
17:30 - 18:00
17:00 - 17:30
16:30 - 17:00
16:00 - 16:30
15:30 - 16:00
15:00 - 15:30
14:30 - 15:00
14:00 - 14:30
13:30- 14:00
13:00 - 13:30
12:30 - 13:00
12:00 - 12:30
11:30 - 12:00
11:00 - 11:30
10:30 - 11:00
10:00 - 10:30
9:30- 10:00
9:00 - 9:30
Sesión Pósters
Conferencia Dr. Claudio Cavasotto
Receso
Sesión oral 1
Almuerzo
Conferencia Dr. Mario Cáccamo
Apertura del 3CAB2C
Acreditación
Miércoles 26 / 09 / 2012
Jueves 27 / 09 / 2012
Sesión Pósters
Conferencia Dr. Pablo Meyer-Rojas
Receso
Sesión oral 3
Almuerzo
Conferencia Dr. Hernán Dopazo
Receso
Sesión oral 2
Mesa Redonda - Educación en
Bioinformática
Es posible que el programa sufra
modificaciones en los próximos días. En
ese caso serán comunicadas tan rápido
como sea posible.
Asamblea A2B2C
Cierre del 3CAB2C
Conferencia Dr. Diego Milone
Almuerzo
Conferencia Dr. Guillerme Correa
Olivera
Receso
Sesión oral 4
Viernes 28 / 09 / 2012
Programa
Trabajos por sesión
Miércoles 26
Sesión Oral 1
Miércoles 26, 14:30 a 16:30
Aula 4
Chair: Ariel Chernomoretz
1) Design and virtual screening of new anti-HIV integrase inhibitors (ir)
M. A. Quevedo, M. C. Briñón,
Departamento de Farmacia - Facultad de Ciencias Quı́micas - Universidad Nacional de Córdoba
2) Identification of binding motifs in large-scale peptide data sets using a Gibbs sampling approach (ir)
M. Nielsen, O. Lund, M. Andreatta,
Technical University of Denmark
3) Comparing the Bonferroni and the Benjamini-Hochberg procedures (ir)
D. M. Kelmansky, S. Ferro,
Instituto de Cálculo FCEN UBA, Instituto de Cálculo FCEN-UBA
4) Metabolic pathfinding based on genetic algorithms (ir)
M. Gerard, G. Stegmayer, D. Milone,
Conicet
8
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Jueves 27
Sesión Oral 2
Jueves 27, 10:00 a 12:00
Aula 4
Chair: Elmer Fernandez
1) Assessing protein-disease association significance from candidate ranking lists (ir)
A. J. Berenstein, I. Ibañez, A. Chernomoretz,
Universidad de Buenos Aires - Instituto Leloir, Universidad de Buenos aires- Instituto Leloir
2) Reverse engineering HD-Zip transcriptional regulatory networks (Ft. Information Theory) (ir)
A. L. Arce, M. Capella, D. Ré, R. L. Chan, A. Chernomoretz,
Instituto de Agrobiotecnologı́a del Litoral, Fundación Instituto Leloir
3) Advantages of balanced classifier design on microarray data classification (ir)
M. Brun, I. Pagnuco, V. Ballarin,
Facultad de ingenierı́a UNMdP, Facultad de ingenieria UNMdP-CONICET
4) Conformational diversity and evolutionary rates in proteins (ir)
D. Zea, M. S. Fornasari, C. Marino Buslje, G. Parisi,
Fundación Instituto Leloir, Universidad Nacional de Quilmes, SBG Universidad Nacional de Quilmes
Sesión Oral 3
Jueves 27, 14:30 a 16:30
Aula 4
Chair: Ignacio Sanchez
1) Glycobioinformatics: Using solvent structure to predict and characterize protein carbohydrate complexes (ir)
M. Marti,
University of Buenos Aires
2) Eukaryotic secretory pathway proteins avoid occluded N-glycosylation sequons (ir)
M. López Medus, G. E. Gómez, P. M. Couto, L. Landolfo, J. J. Caramelo,
Fundación Instituto Leloir-Conicet-IIBBA, Fundación Instituto Leloir-Conicet-IIBBA, Departamento de Quı́mica Biológica-FCEN-UBA, Fundación Instituto Leloir
3) Dissecting relationships between sequence, structure and functions in the Ankyrin Repeat Protein
Family (ir)
R. Gonzalo Parra, R. Espada, D. U. Ferreiro,
Protein Physiology Lab, Dpto de Quı́mica Biológica, FCEyN-UBA and CONICET
4) Structure-Function Prediction of Highly Variable Sub-sequences of Protein Subfamilies (ir)
M. V. Revuelta, A. ten Have,
IIB-CONICET-UNMdP
9
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Viernes 28
Sesión Oral 4
Viernes 28, 10:00 a 12:00
Aula 4
Chair: Fernan Aguero
1) The Comparisons of Sequences with the Nucleotide Database (NCBI) and the BLAST tool. What information we can obtain? (ir)
V. E. Firmenich, M. E. Fernández Feijóo, M. B. Espinosa,
CONICET, ALS Group, UBA
2) Alternative models about the origin of Ribosome Inactivating Proteins genes (ir)
W. Lapadula, M. Juri Ayub, M. V. Sanchez-Puerta,
Instituto de Biologı́a Agrı́cola de Mendoza (IBAM), Lab. Biol. Mol. UNSL. IMIBIO-SL (CONICET)
3) Phylogenetic relationships of Rhinella arenarum beta-catenin. A developmental biology useful model (ir)
M. A. Hasenahuer, C. D. Galetto, V. H. Casco, M. F. Izaguirre,
Facultad de Ingenierı́a, UNER
4) HMMerCTTer: Tailor-made Decision Making for the Semi-automatic Clustering of large Protein Superfamilies (ir)
H. G. Bondino, I. A. Pagnuco, M. V. Revuelta, M. Brun, A. ten Have,
IIB-CONICET-UNMdP, Laboratorio de Procesamiento Digital de Imagenes, FI-UNMdP, Advanta Semillas SAIC
Centro de Investigación en Biotecnologı́a
10
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Miércoles 26
Sesión de Posters 1
Miércoles 26
Aula 3
1) High throughput pyrosequencing and bioinformatics of a multi-extreme environment (ir)
N. Rascován, S. Revale, E. Mancini, M. Vázquez, M. E. Farı́as,
INDEAR, PROIMI
2) Non extensive statistics generalization of Jensen Shannon divergence for DNA sequence analysis (ir)
M. Ré, P. Lamberti,
FRC - UTN ; FaMAF - UNC, FaMAF - UNC
3) Distribution of bioactive peptides in NR (ir)
A. E. Nardo, M. C. Añón, G. Parisi,
Departamento de Ciencia y Tecnologı́a, Universidad Nacional de Quilmes, Roque Saenz Peña 182, Bernal B1876BXD,
Argentina, Centro de Investigación y Desarrollo en Criotecnologı́a de Alimentos (CIDCA,Universidad Nacional de
La Plata, CONICET), La Plata 47 y 116 (1900), Argentina.
4) The relation between the divergence of sequence and structure in intrinsically disordered proteins
(ir)
N. Palopoli, J. Glavina, I. E. Sánchez,
Universidad de Buenos Aires
5) Design of a pipeline for de novo identification of cis-regulatory elements involved in transcriptional
re-programming during tomato fruit development and ripening (ir)
T. Duffy, F. Carrari,
Instituto Nacional de Tecnologı́a Agropecuaria
6) Identification of putative subtelomeric regions in the genome of Toxoplasma gondii (ir)
S. Carmona, M. C. Dalmasso, S. Angel, F. Agüero,
IIB-INTECH-UNSAM-CONICET
7) Phylogeny of fungal species of genus Aspergillus using ITS sequences (ir)
M. Cossio, G. Sioli, G. Perona,
INBIOMIS
8) On line comparison of sequences alignment and phylogenetic analysis of native Trichoderma sp from
Misiones province (ir)
G. Sioli, L. Castrillo, M. Cossio, N. Amerio, M. I. Fonseca, L. Villalba, P. Zapata,
INBIOMIS
9) Prediction of blood to liver coefficients for volatile organic compounds: a cheminformatics approach
(ir)
D. Palomba, M. J. Martinez, I. Ponzoni, M. Dı́az, G. E. Vazquez, A. Soto,
Laboratory for Research and Development in Scientific Computing (LIDeCC), DCIC, UNS, Faculty of Computer
Science, Dalhousie University, Halifax, Canada, Planta Piloto de Ingenierı́a Quı́mica (PLAPIQUI) CONICET-UNS
10) Predicting Protein Function from Sequence and Structural Data: a Globin’s Family Case (ir)
J. P. Bustamante, M. Marti, D. Estrin,
INQUIMAE
11
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
11) Variations of ligand binding affinity upon protein conformational diversity (ir)
E. I. Juritz, A. Monzón,
Quilmes National University
12) Effect of the o-glicosilation in the binding of Extensins to Peroxidases. (ir)
A. Aptekmann, J. Estevez, A. Nadra,
UBA-QB, UBA-FBMC
13) Using Computer Simulations to Understand Enzyme Mechanisms: Application to Mycobacterium
tuberculosis CYP121 Unusual Reaction (ir)
V. G. Dumas, L. Defelipe, A. Petruk, A. Turjanski, M. Martı́,
Departamento de Quı́mica Biologica e Inquimae-Conicet, Facultad de Ciencias Exactas y Naturales, UBA, Departamento de Quı́mica Inorgánica, Analı́tica y Quı́mica Fı́sica e Inquimae-Conicet, Facultad de Ciencias Exactas y
Naturales, UBA
14) Theoretical studies of membranes at different thermotropic phases in salts solutions by molecular
dynamics. (ir)
F. E. Herrera, M. D. L. M. Sales, D. E. Rodrigues,
FBCB
15) Online modeling of Endoglucanases from Aspergillus genus using PHYRE2 (ir)
M. Cossio, G. Sioli, G. Perona,
INBIOMIS
16) Comparison of two homology based protein structure online software (ir)
M. Perona, M. Molina, M. Cossio,
INBIOMIS
17) Relative mobility of epitopes residues in immunogenic proteins (ir)
M. Astorga, S. Fernández Alberti, G. Parisi,
Universidad Nacional de Quilmes, Universidad Nacional de La Plata, Universidad Nacional de La Plata
18) Identification of putative LxCxE motifs targeting the retinoblastoma protein in human viruses by
structure- and sequence-based calculations (ir)
J. Glavina, L. B. Chemes, G. de Prat-Gay, I. E. Sánchez,
Universidad de Buenos Aires, Fundación Instituto Leloir
19) Design of novel DNA-binding specificity in proteins from the “zinc finger” family (ir)
B. Basanta, A. Alibes, L. Serrano, A. Nadra,
Centre for Genomic Regulation, Universidad de Buenos Aires
20) Diversity and evolution of retinoblastoma protein-binding LxCxE motifs in human proteins (ir)
L. B. Chemes, J. Glavina, I. Sanchez, G. de Prat-Gay,
Fundación Instituto Leloir, Protein Physiology Lab, Universidad de Buenos Aires, Fundacion Instituto Leloir
21) Molecular Dynamics and Circular Dichroism Study of VBT:VBA Polymers (1:1 and 1:4). Structure
and Dynamics comparison. (ir)
A. Fuselli, S. Garay, D. Martino, D. Rodrigues,
Facultad de Bioquı́mica y Cs. Biológicas - UNL - INTEC (UNL-CONICET), Facultad de Bioquı́mica y Cs. Biológicas
- UNL
22) LATERAL PRESSURE EFFECTS ON STRUCTURAL PROPERTIES OF DPPC LIPID BILAYERS IN
GEL AND LC PHASES: A MOLECULAR DYNAMICS STUDY (ir)
S. A. Garay, J. F. Quaranta, D. E. Rodrigues,
Facultad de Bioquı́mica y Cs. Biológicas - UNL - INTEC (UNL-CONICET), Facultad de Bioquı́mica y Cs. Biológicas
- UNL
23) Comparison of Classifier Design Algorithms on a Small Sample Microarray Data (ir)
12
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
I. Pagnuco, M. Brun, V. Ballarin,
Facultad de ingenierı́a UNMdP, Facultad de ingenieria UNMdP-CONICET
24) Metagenomics and metatranscriptomics of soil microbial communities developing in bulk and rizospheric soils of argentinean pampa region. (ir)
N. Rascovan, B. Carbonetto, E. Mancini, M. Reinert, S. Revale, M. Vazquez,
Plataforma de Genómica y Bioinformática, INDEAR, Rosario, Santa Fe, Argentina.
25) Following the tracks of the trypanosoma cruzi prenilome (ir)
E. Porta, G. Labadie,
IQUIR-CONICET
26) Web-based gene-expression analysis using the plant biology analysis tools: GENEVESTIGATOR
(ir)
M. G. Acosta, M. A. Ahumada, S. L. Lassaga, V. H. Casco,
LAMAE - FI-UNER y Cátedra de Biologı́a, FCA-UNER, LAMAE - FI-UNER, Cátedra de Biologı́a, FCA-UNER,
Cátedra de Genética y Mejoramiento Vegetal, FCA-UNER
27) DIGESuite: a Cytoscape plug-in for 2D-DIGE analysis (ir)
S. Taleisnik, J. Mishima, C. Fresno, M. Semrik, G. Ribero, G. Merino, L. Prato, A. Llera, E. Fernandez,
BioScience Data Mining Group - Fac. Ing. - UCC, CONICET, Universidad Nacional de Villa Marı́a, Fundación Instituto Leloir, CONICET, Facultad de Ingenierı́a - UNER, Universidad Católica de Córdoba, CONICET, Universidad
Católica de Córdoba
28) Strategies for gap-closure of Thermus sp. 2.9 genome (ir)
L. Navas, A. Amadı́o, R. Zandomeni,
Instituto de Microbiologı́a y Zoologı́a Agrı́cola (IMyZA), Instituto Nacional de Tecnologı́a Agropecuaria (INTA), Las
Cabañas y de Los Reseros, Buenos Aires, Argentina, CONICET – EEA Rafaela, Instituto Nacional de Tecnologı́a
Agropecuaria (INTA)
29) Analysis of variability of Mal de Rı́o Cuarto virus (MRCV) through haplotype networks (ir)
M. A. Garcı́a, M. D. L. P. Giménez Pecci, J. B. Cabral, I. G. Laguna, F. Maurino, C. H. Vera,
INTA IPAVE - CIAP, CONICET, INTA IPAVE - CIAP, UTN FRC
30) One vs One Artificial Neural Network strategy for gene expression multiclass classification (ir)
L. Remon, L. Juárez, D. Arab Cohen, C. Fresno, L. Prato, L. Villoria, E. Fernandez,
Universidad Nacional de Villa Maria, Universidad Catolica de Cordoba, CONICET, Universidad Catolica de Cordoba
31) SVM Tree with Optimal Multiclass Partition applied to Gene expression signature classification (ir)
M. Pallarol, D. Arab Cohen, C. Fresno, L. Prato, E. Fernandez,
Universidad Nacional de Villa Maria, Universidad Catolica de Cordoba, CONICET, Universidad Catolica de Cordoba, Biomedical Data Mining Group UCC
13
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Jueves 28
Sesión de Posters 2
Jueves 28
Aula 3
1) 25S-18S ribosomal nature of the not NOR-associated highly GC-rich heterochromatin of chili peppers (Capsicum-Solanaceae) (ir)
M. Grabiele, H. Debat, M. Scaldaferro, G. Seijo, D. Ducasse, E. Moscone, D. Martı́,
IBONE-UNNE-CONICET, IBS-UNaM-CONICET, IMBIV-UNC-CONICET, IFFIVE-INTA
2) FuL: A Logic processor to aid design and validate virological experiments (ir)
A. Kondrasky, D. Gutson, C. Areces,
FuDePAN
3) 14-3-3 isoforms subfunctionalization revealed by systems biology analysis of cross-talk between
phosphorylation and lysine acetylation (ir)
M. Uhart, D. Bustos,
INTECH
4) Simulation of pesticide effect on thermo-dependent arthropod populations: fixed point iteration
method (ir)
C. Bartó, J. Edelstein, E. Trumper,
INTA, UN Córdoba
5) Honeybees colony virtual simulation, step 2 (ir)
M. Migueles, L. Gende, L. Defeudis, P. Macri, M. Churio, M. Eguaras, L. Braunstein,
Universidad Nacional de Mar del Plata, Universidad Nacional de Mar del Plata-CONICET
6) Unraveling the molecular basis of mammalian inner ear evolution: analysis of the outer hair cell cytoskeleton protein spectrin (ir)
F. Pisciottano, B. Elgoyhen, L. Franchini,
INGEBI - CONICET
7) Characterization of long interspersed non- LTR elements in section Arachis (ir)
S. Samoluk, D. Carisimo, G. Robledo, G. Seijo,
Instituto de Botánica del Nordeste, Instituto de Botánica del Nordeste,Facultad de Ciencias Exactas y Naturales y
Agrimensura (Universidad Nacional del Nordeste)
8) Estimation of Species Richness in Microbial Communities (ir)
C. Santa Maria, M. Soria,
UNLAM, Universidad de Buenos Aires. Faultad de Agronomı́a
9) Conformational diversity and evolutionary rates in proteins (ir)
D. Zea, M. S. Fornasari, C. Marino Buslje, G. Parisi,
Fundación Instituto Leloir, Universidad Nacional de Quilmes, SBG Universidad Nacional de Quilmes
10) CoDNaS database: The conformational diversity of proteins and its relationship with biological
properties (ir)
A. Monzon, G. Parisi, E. Juritz,
UNER-UNQ, CEI-UNQ
14
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
11) Attacking Mycobacterium Tuberculosis in the dormant phase: A Combination of expression data
with structural druggability and nitrosative stress sensitivity (ir)
L. G. Radusky, L. A. Defelipe, A. G. Turjanski, M. Martı́,
Departamento de Quı́mica Biológica - Universidad de Buenos Aires
12) INTA bioinformatic platform: An approach using ontology driven database and web interface to
integrate and explore genomic data (ir)
S. Gonzalez, B. Clavijo, M. Rivarola, P. Fernandez, M. Farber, N. Paniego,
INTA-CONICET, INTA-FIUBA, INTA
13) Computational Simulation of inclusion ways of Sulfamethoxazole and Sulfadiazine in Cyclodextrins (ir)
L. Erbes,
Uner
14) How much information keeps the solvation structure of a Crystal Protein? (ir)
C. Modenutti, D. Gauto, L. Radusky, S. Hajos, M. Marti,
University of Buenos Aires
15) Digitization Project in MACN: the importance of standard protocols to obtain high quality taxonomic information (ir)
P. Cossi, C. Zimicz, M. C. Luna, N. Andón, N. Cuadra, M. B. Bukowski Loináz, M. J. Ramı́rez,
Museo Argentino de Ciencias Naturales, “Bernardino Rivadavia”; Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Museo Argentino de Ciencias Naturales, “Bernardino Rivadavia”
16) Computational, biochemical, and spectroscopic studies of the copper-containing nitrite reductase
from the denitrifier Sinorhizobium meliloti 2011 (ir)
M. C. Gómez, F. M. Ferroni, A. C. Rizzi, S. D. Dalosto, C. D. Brondino,
INTEC, UNL
17) Software integration to bioimage management, processing and analysis (ir)
J. E. Diaz-Zamboni, L. Bugnon, E. V. Paravani, C. D. Galetto, J. F. Adur, V. Bessone, M. Bianchi, M. G. Acosta,
S. J. Laugero, V. H. Casco, M. F. Izaguirre,
Laboratorio de Microscopia Aplicada a Estudios Moleculares y Celulares - Facultad de Ingenierı́a - Universidad Nacional de Entre Rı́os
18) Image analysis to control the roast level of the peanut (ir)
I. Arévalo, S. Ojeda,
FAMAF
19) Comparison of the ability to predict true linear B-cell epitopes by on-line available prediction programs (ir)
J. G. Costa, P. L. Faccendini, S. S. Sferco, C. M. Lagier, I. S. Marcipar,
IQUIR, Depto. de Quı́mica Analı́tica, Facultad de Ciencias Bioquı́micas y Farmacéuticas, Universidad Nacional de
Rosario. Suipacha 531. Rosario, Laboratorio de Tecnologı́a Inmunológica, Facultad de Bioquı́mica y Ciencias Biológicas, Universidad Nacional del Litoral. Paraje El Pozo. Santa Fe., Laboratorio de Tecnologı́a Inmunológica, Facultad
de Bioquı́mica y Ciencias Biológicas, Universidad Nacional del Litoral. Paraje El Pozo. Santa Fe, Departamento de
Fı́sica, Facultad de Bioquı́mica y Ciencias Biológicas, Universidad Nacional del Litoral, Paraje El Pozo. Santa Fe; and
INTEC (CONICET-UNL), Güemes 3450, Santa Fe
20) Relationship between divergence of using synonymous codons in host/virus and the presence of
microRNA (ir)
F. Riberi, L. Tardivo, L. Fazzi, G. Biset, D. Gutson, D. Rabinovich,
Instituto Biomédico en Retrovirus y SIDA-INBIRS. Fundación para el Desarrollo de la Programación en ácidos
Nucleicos-FuDePAN, Universidad Nacional de Rı́o Cuarto, Instituto Biomédico en Retrovirus y SIDA-INBIRS, Fundación para el Desarrollo de la Programación en ácidos Nucleicos-FuDePAN
21) A pipeline for structural annotations in bacterial genomes (ir)
15
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
E. Lanzarotti, L. Defelipe, L. Radusky, M. Marti, A. Turjanski,
Departamento de Quimica Biologica , FCEN - UBA, Departamento de Quimica Biologica y INQUIMAE , FCEN UBA
22) VISI: a computational program for antiviral strategies comparison (ir)
D. Gutson, P. Oliva, L. Ramos, P. Pury, F. Herrero, D. Rabinovich,
FuDePAN, FaMAF, Universidad Nacional de Córdoba
23) 1D model of the pulse wave along the systemic arteries (ir)
C. E. Saavedra Fresia, F. E. Menzaque,
National University of Tucuman, National University of Cordoba
24) Agi4x44.2c: a two-colour Agilent 4x44 Qualtiy Control R library for large microarray projects (ir)
G. Gonzalez, C. Fresno, G. Merino, A. Llera, O. Podhajcer, E. Fernandez,
Laboratorio de Terapia Celular y Molecular, Instituto Leloir, Grupo de Mineria de Datos en Biociencias, Facultad de
Ingeniera, UNER
25) MSA2MI: A server to calculate and visualize mutual information in multiple sequence alignments
(ir)
F. L. Simonetti, M. Nielsen, C. Marino Buslje,
Center for Biological Sequence Analysis, Fundación Instituto Leloir
26) GOboot: towards a robust SEA analysis (ir)
C. Fresno, A. Llera, M. R. Girotti, M. P. Valacco, J. A. López, L. Zingaretti, L. Prato, O. L. Podhajcer, M. G.
Balzarini, F. Prada, E. Fernandez,
BioScience Data Mining Group - Fac. Ing. - UCC, CONICET, National Center for Cardiovascular Research, Madrid,
Spain, Biometry Laboratory, National University of Córdoba, The Institute of Cancer Research, London, UK, Fundación Instituto Leloir, CONICET, of Technology, School of Engineering and Sciences, UADE, Instituto A.P. de
Ciencias Básicas y Aplicadas, Universidad Nacional de Villa Maria, Universidad Catolica de Cordoba
27) Development of an algorithm to detect distant orthologous genes in baculoviridae family. (ir)
J. Iserte, M. Garavaglia, S. Miele, M. Belaich, D. Ghiringhelli,
Universidad Nacional de Quilmes
28) COMPUTATIONAL PREDICTION OF THE BIOLOGICAL EFFECTS OF MUTATIONS IN OTC
GENE IN ARGENTINIAN PATIENTS (ir)
S. M. Silvera Ruiz, J. A. Arranz Amo, L. E. Laróvere, R. Dodelson de Kremer,
Unitat Metabolopaties, Hospital Universitari Materno-Infantil Vall d’Hebron, CEMECO, Hospital de Niños de
Córdoba
29) In sı́lico prediction of cross-reactive epitopes of the major soybean allergen Gly m Bd 30K (P34) with
bovine caseins and their analysis by immunochemical methods. (ir)
A. Candreva, G. Parisi, G. Docena, S. Petruccelli,
CIDCA UNLP, La Plata 47 y 116., Departamento de Ciencia y Tecnologı́a, UNQ, Roque Saenz Pena 182, Bernal.,
LISIN, FCE, UNLP, La Plata, 47 y 115.
30) Evolutionary and structural analysis of procirsin, a typical plant aspartic proteinase zymogen (ir)
D. Lufrano, S. Vairo Cavalli, G. Parisi,
LiProVe - Facultad de Ciencias Exactas, UNLP, Departamento de Ciencia y Tecnologı́a, UNQ
31) BiFe: a national EMBNet node hosting Argentine bioinformatics applications (ir)
Embnet Node Argentina,
Protein Physiology Laboratory, Departamento de Quı́mica Biológica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires
32) Construction of phylogenetic trees from Trichoderma sp using the program MEGA 5.10 (ir)
G. Sioli, L. Castrillo, M. Cossio, N. Amerio, L. Villalba, P. Zapata,
INBIOMIS
16
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
17
Resúmenes
18
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Comparing the Bonferroni and the Benjamini-Hochberg procedures
Sebastián Ferro, Diana Kelmansky1
1
Instituto de Cálculo, FCEN-UBA
The Bonferroni procedure for the correction of p-values in multiple tests, based on the
control of the probability of at least one false positive, ie the Family Wise Error Rate
(FWER), has been criticized as being too restrictive, resulting in a low power in
situations of multi-scale tests as those used in the context of genomic experiments, eg
microarrays.
Given that Bonferroni also controls the Per Family Error Rate (PFER), ie the expected
value of false positives, this disadvantage -which is not intrinsic to the procedure but
due to the extreme restrictions of its application using the FWER- can be overcome.
This work, following Gordon, Glazko, Qiu and Yakovlev (2007) proposal, shows that it
is possible to equalize the errors, ie select a PFER for the Bonferroni approach that
results in a given false discovery rate (FDR) . Under errors' equalization it is shown that
similar power levels are obtained for both. However Bonferroni procedure is more
stable than the Benjamini-Hochberg regarding the variance of the total number of
discoveries and the number of true discoveries. In practice this means that the
estimations of FDR that we are controlling for are less reliable than those for the
expected value of false positives.
In addition to verifying the results of Gordon, Glazko, Qiu and Yakovlev (2007) the
results are extended to actual situations on the degree of correlation among genes
expression levels. This extension was possible by modifying the algorithm initially
proposed that reduced processing time and enabled better precision in the error
equalization (Ferro 2011).
Key words: multiple testing, Bonferroni, FDR
References
A. Gordon, G. Glazko, X. Qiu, and A.Y. Yakovlev. Control of the mean number of
false discoveries, Bonferroni, and stability of multiple testing. Annals of Applied
Statistics, 1:179-190, 2007.
S. Ferro. "A more efficient and precise comparison of Bonferroni and BenjaminiHochberg procedures". Tesis de Licenciatura en Cs. de la Computación. FCEN-UBA.
2011
19
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Glycobioinformatics: Using solvent structure to predict and characterize protein
carbohydrate complexes
Marcelo A. Martí
Departamento de Química Biologica e Inquimae‐Conicet, Facultad de Ciencias Exactas y Naturales,
Universidad de Buenos Aires, Ciudad Universitaria, Pabellón 2, Buenos Aires, C1428EHA, Argentina
[email protected]
Formation of protein ligand complexes is a fundamental process in biochemistry. In-silico based
methods that predict the structure of complexes, or docking methods, are widely used and are an
essential part of many rational drug development programs. The potential and reliability of any docking
method lies in its capability to correctly predict the complex structure, taking as starting point the
structures of the protein and ligand separately. Nevertheless, given the approximations involved in the
theoretical developments employed, results are not always successfully achieved.
Carbohydrate binding proteins are a large and diverse group of biomolecules displaying a wide variety
of biological activities including cell recognition, communication and cell growth. In this context
understanding protein-carbohydrate interactions at the molecular level with atomic resolution, is of
fundamental importance for basic and applied glycobiology. A common, but usually overlooked feature
of carbohydrates is the fact that their polar OH groups, quite frequently bind to hydrophilic patches of
the protein surface, resulting in significant solvent displacement and reorganization. Water molecules
and carbohydrate OH groups can participate in similar hydrogen binding networks when establishing
contacts with protein surfaces. With this in mind, we though to use this information in order to in-silico
predict the protein-carbohydrate complexes, with higher accuracy than conventional docking methods.
Analyzing the solvent structure at the protein surface is not an easy task. One of the most potent
methods for studying solvent structure is based on the inhomogeneous fluid solvation theory (IFST)
which allows the determination of several properties for the water molecules from a plain Molecular
Dynamics (MD) simulation. Using, this methodology, recently, we were able to show that solvent
structure and dynamics at protein surfaces involved in carbohydrate binding proteins are very different
as those from the bulk solvent, allowing the identification of the so called water sites (WS) or hydration
sites. The WS correspond to definite regions in the area adjacent to the protein surface where the
probability of finding a water molecule is significantly higher than that observed in the bulk solvent, and
can be further thermodynamically characterized using the IFST.
In the present work, we used the characterization of the WS in the CBS of several carbohydrate
binding proteins, to modify the scoring function of the Docking program Autodock in order to perform
the in-silico determination of the corresponding protein-ligand complexes. Our results clearly show
that the modified function significantly improves the quality and accuracy of the results, both in terms
of how close the predicted complex structure resembles the real one (i.e the one obtained by
crystallography), and in the differentiation of true from false positives and negatives. The resulting
solvent structure biased docking protocol thus results in a powerful tool to the design and optimization
of glycomimetic drugs development, and for the basic understanding of protein carbohydrate
interactions.
1. Carbohydrate-binding proteins: Dissecting ligand structures through solvent environment
occupancy. Diego F. Gauto, Santiago Di Lella, Carlos M. A. Guardia, Darío A. Estrin and
Marcelo A. Martí*, J. Phys. Chem B. 2009 113(25) 8717-8724.
2. Structural basis for ligand recognition in a mushroom lectin: solvent structure as
specificity predictor. Gauto DF, Di Lella S, Estrin DA, Monaco HL, Martí MA. Carbohydr Res.
2011, 15;346(7):939-48.
3. Characterization of the Galectin-1 Carbohydrate Recognition Domain in Terms of Solvent
Occupancy. Di Lella S, Marti MA, Alvarez RM, Estrin DA, Ricci JC. J Phys Chem B. 2007, 111,
(25 ) 7360-7366
20
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Using Computer Simulations to Understand Enzyme Mechanisms:
Application to Mycobacterium tuberculosis CYP121 Unusual Reaction
Victoria G Dumas, Lucas A Defelipe, Ariel A Petruk, Adrian G Turjanski and Marcelo A Martí
Departamento de Química Biologica e Inquimae‐Conicet, Facultad de Ciencias Exactas y Naturales,
Universidad de Buenos Aires, Ciudad Universitaria, Pabellón 2, Buenos Aires, C1428EHA, Argentina
[email protected]
Understanding enzyme mechanism at molecular level is an invaluable information in the field of
protein inhibitor design, protein de-novo engineering and general biochemistry. Computer
simulation methods, mainly Quantum Mechanics (QM) and Classical Molecular Dynamics (MD)
based methods, provide an extraordinary tool to study enzyme reaction mechanisms, since they
allow to actually simulate and see the reaction happen in the computer [1,2].
Cytochromes of the p450 type (Cyps) are a large and ubiquitous family of heme proteins
which usually catalyze oxidation (in most cases hydroxylation) of organic compounds. In
mammals they are responsible for the metabolism of majority of pharmacological compounds,
and they are also studied as potential antimicrobial targets or as potential enzymes for
biotechnological purposes. Among the 20 Cyps encoded by the Mycobacterium tuberculosis (Mt)
genome, CYP121 was encountered as essential for the viability of the bacilli, making it a
potential target for antitubercular drugs design. Interestingly, the mechanism by which CYP121 carries
out its activity remains unknown. There is evidence that suggests that this protein is responsible for
catalyzing the formation of a C‐C bond between the two aromatic cycles of cyclopeptide cyclo(l‐Tyr‐l‐
Tyr) (cYY) resulting in a new chemical entity [3,4], a reaction which is quite unusual for CYPs.
In this work, we have used a combination of classical molecular dynamics (MD) and hybrid
quantum‐classical (QM/MM) methodologies in order to elucidate the reaction mechanism carried out
by this interesting and important protein. By means of classical simulations we could see the effect of
the protein in restraint the movement of the ligand, allowing the two carbon atoms being activated and
positioned in sufficient proximity to allow covalent linkage. We used hybrid QM-MM methods to
calculate the free energy profile of the reaction, showing that the C-C bond formation involves a spin
shift along the reaction resulting in a moderate barrier due to spin crossing. Taken together our results
allow for a better understanding of these interesting enzyme and for the general reaction mechanism
of CYPs protein.
References
1. Capece L, Lewis-Ballester A, Yeh S-R , Estrin DA & Marti, MA (2012). Complete reaction
mechanism of indoleamine 2,3-dioxygenase as revealed by QM/MM simulations. The
journal of physical chemistry. B, 116(4), 1401-1413.
2. Lewis-Ballester A, Batabyal D, Egawa T, Lu C, Lin Y, Marti MA, Capece L, et al. (2009).
Evidence for a ferryl intermediate in a heme-based dioxygenase. Proceedings of the
National Academy of Sciences of the United States of America, 106(41), 17371-6.
3. Belin P, Le Du MH, Fielding A, Lequin O, Jacquet M, Charbonnier J-B, Lecoq A, et al. (2009).
Identification and structural basis of the reaction catalyzed by CYP121, an essential
cytochrome P450 in Mycobacterium tuberculosis. Proceedings of the National Academy of
Sciences of the United States of America, 106(18), 7426-7431.
4. McLean KJ, Carroll P, Lewis DG, Dunford AJ, Seward HE, Neeli R, Cheesman MR, et al.
(2008). Characterization of active site structure in CYP121. A cytochrome P450 essential
for viability of Mycobacterium tuberculosis H37Rv. The Journal of biological chemistry,
283(48), 33406-16.
21
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Design and virtual screening of new anti-HIV integrase inhibitors.
Mario Alfredo Quevedo, Margarita Cristina Briñón
Dpto. de Farmacia, Fac. de Ciencias Químicas, Universidad Nacional de Córdoba (UNC),
Argentina. ([email protected])
Background information
The integrase (IN) of the human immunodeficiency virus (HIV) is a key enzyme catalyzing
viral/host DNA integration. Its inhibition interrupts the viral life-cycle and thus is actively
studied to design potent and selective anti-HIV drugs. Unfortunately, obtaining IN inhibitors
assisted by computational methods has been hindered by the lack of a complete crystal structure
F
of the functional IN intasome. Considering that the
Fig. 1
crystal structure of the closely related prototype foamy
OH
O
virus (PFV) intasome was obtained recently,1 this work
O
deals with the design and screening of new IN
C
inhibitors based on scaffold A (Fig. 1), a versatile
N
O
CH
R
leader for high throughput chemical synthesis.
2
CH 3
Scaffold A
Material and methods In a first stage, the screening methodology was validated using a set of 16 compounds
structurally related to scaffold A, whose anti-HIV activities are reported. The crystal structure of
PFV (pdb: 3OYA) was used for molecular docking procedures. Ionization state and tautomer
analyses were performed, after which an exhaustive rigid docking approach was applied based
on generated conforme libraries. Ligand analyses, rigid docking and post processing were
performed using software packages developed by OpenEye Inc.
In a second stage, a set of 1000 compounds (massive library) was created and subjected to
molecular docking screening.
Results
Assay Validation: Very good correlations between the docking rank and antiviral potency was
found, with compounds in the low, mid, high namolar IC50 range ranking in the first, second and
third order, respectively. Only one outlier was found (false negative), which was attributed to a
different chemical substitution pattern that the rest of the compounds.
Screening assays: out of the 1000 molecules screened in the massive library, 63 exhibited higher
docking rankings than the most potent compound in the training set. The synthetic feasibility of
these 63 compounds was assessed, selecting 12 for further synthesis and anti-HIV evaluation.
Conclusions
The crystal structure of PFV seems adequate for the design and virtual screening of HIV
integrase inhibitors, at least in chemical series exploring substitution on R (Fig. 1). Also the high
speed of the search method is compatible with the screening of high number of compounds.
References
1. Hare S, Gupta SS, Valkov E, Engelman A, Cherepanov P: Retroviral intasome assembly
and inhibition of DNA strand transfer. Nature 2010, 464:232-236.
22
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Alternative models about the origin of Ribosome Inactivating Proteins genes
Lapadula Walter Jesús, Sanchez-Puerta M. virginia and Juri Ayub Maximiliano
Lab. Biol. Mol. UNSL. IMIBIO-SL (CONICET)
E-mail: [email protected]
Background
Ribosome inactivating proteins (RIPs) are N-glycosidases that depurinate a specific adenine residue in the
conserved sarcin/ricin loop of 28S rRNA. The most widely studied examples of RIPs are ricin, a potent toxin of
Ricinus communis, and Shiga toxins from enteric bacteria causing hemolytic uremic syndrome. RIPs genes have
been reported to be present in many plants and a few bacteria. In addition, there are biochemical evidences of
RIP activity in a few fungi species. The analysis of RIPs phylogeny is problematic because the sequences are highly
divergent, and their distribution across species is patchily distributed. It is currently assumed that RIPs genes
were originated in plants and prokaryotic RIPs have been acquired by a single Horizontal Gene Transfer (HGT)
event from plant to bacteria [1]. We have recently reported a phylogenetic analysis of RIP sequences [2]. In the
present work, we performed exhaustive searches for novel RIP genes in genomic and EST databases. We found
novel RIP encoding sequences from bacteria, and more interestingly, eleven RIP genes in fungal WGS. These
results suggest that the current view of RIPs phylogeny should be revisited. Therefore, we performed sequence
alignments using different algorithms (CLUSTALW, T-COFEE, MAFFT), and new phylogeny inferences including the
novel sequences. The resulting data were analyzed in the context of phylogenetic relationships among species, in
order to propose the most plausible hypothesis. Altogether, the data can be explained by at least two alternative
models (see Figure 1):
I. RIPs genes originated in plants and were acquired via HGT by fungi and bacteria. This model implies at
least three independent HGT events.
II. RIPs genes were present in the common ancestor of eukaryotes and bacteria, and were lost in several
lineages through evolution. This model implies several loss events in different lineages; archaea, metazoan,
many bacteria, etc.
The pros and cons of these models are discussed in this work.
Figure 1
Schematic representation of the tree of life showing the most relevant taxa according to reference [3].
Divergence times in million of years ago (Ma) are shown based in references [4, 5]. Model I is showed by
appearance of RIP genes in plants (open circle) and three independent HGT events to Gram+ bacteria, Grambacteria, and fungi are shown by arrows. Model II is showed by an earlier origin of RIP genes (black circle) and
several independent losses in different linages (grey circles).
Reference
1. Peumans WJ, Van Damme EJM: Evolution of plant ribosome inactivating Proteins. In: Lord, JM, Hartley, MR
(Eds.), Toxic Plant Proteins. 2010, 1–26.
2. Lapadula WJ, Sanchez-Puerta MV, Juri Ayub M: Convergent evolution led ribosome inactivating proteins to
interact with ribosomal stalk. Toxicon 2011, 57:427-432.
3. Ciccarelli F, Doerks T, Von Mering C, Creevey C, Snel B, Bork P: Toward Automatic Reconstruction of a Highly
Resolved Tree of Life. Science 2012, 311:1283-1286.
4. Da-fei Feng, Cho G, Doolitle RF: Determining divergence times with a protein clock: Update
and reevaluation. PNAS 1997, 94: 13028–13033.
5. Battistuzzi FU, Feijao A, Hedges B: A genomic timescale of prokaryote evolution: insights into the origin of
methanogenesis, phototrophy, and the colonization of land. BMC Evolutionary Biology, 2004, 4:44.
23
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Comparison of Classifier Design Algorithms on a Small Sample Microarray Data
Inti Anabela Pagnuco1,2, Marcel Brun1,3, Virginia Ballarin1
1
Lab. de Procesos y Medición de Señales, Fac.de Ingeniería, Univ. Nacional de Mar del Plata, Bs. As., Argentina
CONICET
3
Departamento de Matemáticas, Fac. de Ingeniería, Univ. Nacional de Mar del Plata
2
Introduction
Single Nucleotide Polymorphism (SNP) are one base mutations on the ADN sequences, usually
producing phenotypical changes.
Some Pattern Recognition techniques, including classification, feature selection and error
estimation, are used to find the relationship between SPNs and changes in phenotype. In this work we
compare three techniques used to design classifiers from samples: plug-in, neural networks (NN), and
multi-resolution (MRS) [1], applied to classification of cattle based on SNPs, using 2 sets of SNPs,
obtained by a previous feature selection algorithm [2].
Because an important aspect of designing classifiers for genomic studies, including data from
microarray, SNPs and other platforms, is the existence of few training samples, we propose to study how
the method behaves when the number of samples reaches small values. For this reason, the comparison
was done by plotting the cross-validation error as a function increasing values of the number of training
samples. These samples were selected, each time, randomly from the available samples.
Results
From the 145 samples, we first used n=10 samples for classifier design, then increased the number
by 15 samples until reaching 145 samples. Random samples of size n were obtained, a classifier was
designed over these n samples, and its error estimated on the samples not used for classification. This
process was repeated 100 times to average the results from random sampling.
The figure shows the results from the two sets of samples. The x-axis corresponds to the number
of samples used to train the classifier, and the y-axis corresponds to the cross-validation error. The lines
are red for NN, yellow for Plug-In, and blue for MRS. The small vertical lines indicate the variance of the
estimated error over the 100 realizations. We can see that for large training sets, the three methods have
similar performance, but the NN performance suffer most the reduction of training samples (values below
50 in both graphs).
(a)
(b)
Figure 1: Cross-validation error for (a) the first set of SNPs, and (b) for the second set of SNPs
Conclusion
In this work we can observe how NN design may be a poor choice for SNP classification when the
number of samples is small. Additional work was done on simulated data and other genotypic studies.
References
[1] U. Braga-Neto, y E. Dougherty, Classification,Genomic Signal Processing and Statistics, EURASIP
Book Series on Signal Processing and Communication, Hindawi Publishing Corporation, 2005.
[2] Gonzalez, Mariela A.; Brun, Marcel; Corva, Pablo M.; y Ballarin, Virginia. “Análisis de señales
genómicas para la clasificación de razas bovinas”, CAI, 2009
24
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Advantages of balanced classifier design on microarray data classification
Marcel Brun1.2, Inti Anabela Pagnuco1,3, Virginia Ballarin1
1
2
Grupo de Procesamiento Digital de Imágenes, Fac. de Ingeniería, Univ. Nacional de Mar del Plata
Departamento de Matemáticas, Fac. de Ingeniería, Univ. Nacional de Mar del Plata, 3 CONICET
Introducción
The analysis of high throughput genomic data is an important tool for disease diagnosis and prognosis,
as in the study of the gene to gene interaction networks. In this context, classifier design, feature selection
and error estimation play a fundamental role on the determination of the effects of genotype on phenotype.
These three methods are usually dependent between themselves since, for example, feature selection
may be an intrinsic part of a classifier design system. In this context, the number of samples used to design
a classifier is an important factor that affects the quality of the results. Moreover, the amount of samples
of each class may affect considerably the interpretability and usability of the results.
In general, and specifically in genomics signal processing, the importance of obtaining a good
classifier design is of uttermost importance, since they may be used to determine medical treatment. In
this work we analyzed the effect of the imbalance of samples (between positive and negative) on synthetic
and real genomic data, compared to the design using artificial balancing, extending the analysis done
in [1], by applying the analysis to several real and artificial datasets.
Results
We studied the classifier errors, false positive rate (FPR) and false negative rate (FNR) for several
datasets, comparing standard design against balanced design, using binary classification with a multiresolution approach [2]. The error measures were computed using a cross-validation approach. For the
analysis we used seven public datasets and three simulated datasets, the later ones showing three different
balances. Table 1 shows the results (Error, FPR and FNR) for both balanced (B) and unbalanced (NB)
design. The last two rows show the number of positive/negative samples, and the number of features. The
large sample size for the synthetic experiments avoid issues related to small sample error estimation.
In the last column (very unbalanced dataset) we can see how the error is very small, since a constant
classifier does almost a perfect job, as shown by the fact that unbalanced design produces 0% FPR and
100% FNR. Balanced design may increase the overall error, but by generating more realistic values of
FPR and FNR. We can see that similar consideration apply to the data obtained from genomic databases.
Table 1: Error, FPR and FNR for balanced (B) and unbalanced (NB) classifiers design.
Muscle
Kawasaki
Synthetic
Synthetic Synthetic
DataSet
Autismo Listeria Diabetes desease Influenza desease
1
2
3
Error B
FPR B
FNR B
0.09
0.058
0.115
0.367
0.264
0.618
0.2
0.644
0.014
0.164
0.154
0.178
0.090
0.057
0.214
0.274
0.057
0.214
0.492
0.514
0.469
0.446
0.312
0.700
0.355
0.297
0.678
Error NB
FPR NB
FNR NB
0.093
0.062
0.122
0.358
0.215
0.699
0.25
0.637
0.045
0.178
0.062
0.365
0.085
0.036
0.292
0.229
0.036
0.292
0.498
0.520
0.476
0.351
0
1
0.150
0
1
Clase 1/2
# Variables
7/8
1000
82/34
123
4/9
1000
21/14
1000
73/18
1000
44/8
1000
500/500
8
650/350
8
850/150
8
Conclusion
In both synthetic and real data analysis we can see a better performance, regarding FPR and FNR, for
balanced classifier design. The experiments also describe the dangers of using error as quality measure.
References
[1] Brun, M.; Ballarin, V. Data balancing for Phenotype Classification based on SNPs. 2010.
[2] Brun, M.; Dougherty E; Hirata Jr. R.; Barrera J. Design of optimal binary filters under joint
multiresolution-envelope constraint. Pattern Recognition Letter, 2003.
25
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Predicting Protein Function from Sequence and Structural Data: a Globin’s Family
Case
Juan P Bustamante, Marcelo A Martí, Darío A Estrin
Department of Inorganic, Analytical and Physical Chemistry, INQUIMAE-CONICET. Faculty of Exact and
Natural Sciences, University of Buenos Aires. Buenos Aires, Argentina. ([email protected])
Predicting function from sequence and/or structure are key issues in bioinformatics. Although a broad
functional assignment can be done by assigning a protein to a given family (or domain), determining a
protein particular function is not straightforward. Assuming that the function is coded in the structure
through the determination of its chemical properties, it is possible in principle to predict a putative function
if relevant properties can be computed. The globins family of heme proteins offer a large, diverse and
thoroughly studied set of proteins, whose function is tightly related to small ligand (mainly O 2 but also NO,
1
CO, and H2S) affinity and reactivity with the active site heme. Globins with high oxygen affinity usually
function as O2-redox related enzymes, moderate affinity globins usually act as oxygen carriers, while low
O2 affinity globins are NO or CO sensors. Ligand affinity is determined by the ratio between association
(kon) and dissociation rate (koff). The first one is mainly related with the ligand migration process from the
solvent to the protein active site, which is determined by the presence of internal tunnels and cavities and
residues acting as “gates”, while the second is determined by interactions between protein and bound
ligand. During the last decade our group has developed several in-silico methods to determine both
processes based solely on structural information 2,3,4, showing excellent agreement with the experimental
data for several particular cases. This fact prompted us to extend our analysis to a whole family of
proteins, in this case the truncated hemoglobins (trHbs). TrHbs are a distinct widespread phylogenetic
group of the globins family, which is divided in three different groups: I, II and III (also labeled N, O and
P)5, for which about 1000 different sequences have been reported, and existing at least one determined
structure for each subgroup.
In the present work, all possible different active site and tunnel/cavity structures were built on trHbs
homology based models, based on multiple sequence alignments for each group as determined using
HMM. For each possible structural type, the oxygen stabilization in the active site, related to k off; and the
free energy profile for small ligand migration along the tunnels, related with k on, were computed. With
these data we were able to assign to each protein a putative oxygen affinity (high, moderate or low) and a
ligand binding relative rate (fast or slow), that allow assigning their putative function. These results were
finally combined with phylogenetic and molecular evolution analysis together with literature derived data
about the organism living style. Our results show that ligand affinity characteristics are randomly
distributed among the phylogenetic groups, but they are correlated with the organism living style.
Molecular evolution analysis also show that small changes (even one residue changes) may have a
dramatic impact on the affinity and therefore protein function, being far more important than global
structural changes. In summary, our results not only show that predicting specific functional properties
from sequence/structure is possible, but also reveal interesting aspects about globins family evolutionary
history at molecular level.
Reference
1. Milani M, Pesce A, Nardini M, Outllet H, Outllet Y, Dewilde S, Boceli A, Ascenzi P, Guertin M, Moens L, Friedman JM, Wittenberg
JB, Bolognesi M. Structural bases for heme binding and diatomic ligand recognition in truncated hemoglobins. Journal of
Inorganic Biochemistry. 2005. 99:97-109.
2. Marti MA, Crespo A, Capece L, Boechi L, Bikiel DE, Scherlis DA, Estrin DA. Dioxygen affinity in heme proteins investigated
by computer simulation. Journal of Inorganic Biochemistry. 2006. 100(4):761-70.
3. Capece L, Marti MA, Crespo A, Doctorovich F, Estrin DA. Heme protein oxygen affinity regulation exerted by proximal
effects. Journal of the American Chemical Society. 2006. 128(38):12455-61.
4. Forti F, Boechi L, Estrin DA, Marti MA. Comparing and combining implicit ligand sampling with multiple steered molecular
dynamics to study ligand migration processes in heme proteins. Journal of Computational Chemistry. 2011. 10.1002/jcc.21805.
5- Vuletich DA, Lecomte JTJ. A Phylogenetic and Structural Analysis of Truncated Hemoglobins. Journal of Molecular
Evolution. 2006. 62:196–210.
26
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Identification of binding motifs in large-­‐scale peptide data sets using a Gibbs sampling approach Morten Nielsen, Ole Lund and Massimo Andreatta Center for Biological Sequence Analysis, Technical University of Denmark, DK-­‐2800 Lyngby, Denmark Background Proteins recognizing short peptide fragments drive a large part of cellular signaling. Accurate description of the specificities of such peptide binding receptors in many cases can provide insights to their function and can be used for instance to design disease diagnostics, inhibitor compounds, and vaccines. The recent advances in high-­‐throughput technologies for generation of peptide data have made it both faster and cheaper to generate large libraries of peptides for the study of peptide-­‐binding protein specificities. Interpretation of such large peptide data sets however is not trivial and requires computational methods capable of identifying subtle recurring patterns shared among particular sets of peptides. This task becomes more challenging when the data contain more than one pattern, and/or the motifs are found at different registers within distinct peptides. Several methods have been developed aiming to address this problem [1-­‐4], ranging from simple multiple sequence alignment methods to advanced motif identification methods, including artificial neural networks, hidden Markov models and Gibbs sampling. However, all these methods have the severe limitations of only dealing with single specificities or requiring the input data to be pre-­‐aligned to a common motif. Results Here, we present an algorithm based on Gibbs sampling aiming to go beyond these limitations. The method can simultaneously align and cluster peptide data sets containing an a priori unknown number of specificities. We apply the method to de-­‐convolute binding motifs in a panel of peptide data sets with different degrees of complexity spanning from the simplest case of pre-­‐
aligned fixed-­‐length peptides, to cases of unaligned peptide data sets of variable length. Example applications include mixtures of binders to different MHC class I and class II alleles, distinct classes of ligands for SH3 domains, and sub-­‐specificities of the HLA-­‐A*02:01 molecule. The results of the analysis for the SH3 domain peptide data are shown in Figure 1. Figure 1. Sequence motifs on SH3 domain binding data clustered in 1 to 3 clusters. a) Sequence motif of the data set aligned in one single cluster. b) Sequence motifs for SH3 domain data split in two clusters. The two groups are in strong agreement with the canonical class I (panel c, 1,892 peptides) and class II (panel b, 498 peptides) types of SH3 domain ligands. c) Sequence motifs when the data is split in 3 clusters. The clusters have sizes of respectively 1,606, 490 and 305 peptides. Data was taken from [3]. +"#$%&'()"
*"#$%&'()&"
,-"
!"#$%&'()&"
.-"
#-"
Conclusions A Gibbs clustering algorithm was developed allowing the simultaneous identification of multiple subtle receptor motifs within peptide data sets. In benchmark calculations on data sets containing multiple binding motifs (both pre-­‐aligned fixed-­‐length peptides and unaligned peptides of variable length) the method consistently demonstrated high performance. The Gibbs clustering algorithm is available online as a web server at http://www.cbs.dtu.dk/services/GibbsCluster-­‐1.0. References 1. Nielsen, M., C. Lundegaard, and O. Lund, Prediction of MHC class II binding affinity using SMM-­‐
align, a novel stabilization matrix alignment method. BMC Bioinformatics, 2007. 8: p. 238. 2. Andreatta, M., et al., NNAlign: a web-­‐based prediction method allowing non-­‐expert end-­‐user discovery of sequence motifs in quantitative peptide data. PLoS ONE, 2011. 6(11): p. e26781. 3. Kim, T., et al., MUSI: an integrated system for identifying multiple specificity from very large peptide or nucleic acid data sets. Nucleic Acids Res, 2012. 40(6): p. e47. 4. Noguchi, H., et al., Hidden Markov model-­‐based prediction of antigenic peptides that interact with MHC class II molecules. J Biosci Bioeng, 2002. 94(3): p. 264-­‐70. 27
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Metagenomics and metatranscriptomics of soil microbial communities
developing in bulk and rizospheric soils of argentinean pampa region.
Nicolás Rascovan1, Belén Carbonetto1, Santiago Revale1, Estefanía Mancini1, Marina
Reinert1 and Martin Vazquez1.
1
Plataforma de Genómica y Bioinformática, INDEAR, Rosario, Santa Fe,
Argentina.
Background
Soils are one of the most biologically diverse environments on earth, but more than 90% of the
microorganisms have not been studied as they cannot be cultured in the laboratory. A strategy
developed to overcome this issue is to study the microbial communities by sequencing their
genomic content and approach called metagenomic. In the last few years a debate has been
established about the deleterious effect that intensive agriculture might produce to microbial
communities of agricultural soils. The goal of the present work is to contribute to this debate by
analyzing microbial communities of pampean soils from a metagenomic perspective.
Material and methods
DNA extracted from bulk soils of agricultural and non agricultural sites and DNA and RNA from
rhizospheric soils under two different agricultural managements were sequenced by high throughput
pyrosequencing using 16S rRNA amplicon sequencing (AS) and whole genome shotgun (WGS).
Over 17 Gbp were obtained by WGS and more than 1 Mi sequences for AS. Amplicon sequences
were fully analyzed using QIIME software package and WGS using the MG RAST annotation and
analysis tool. In addition we used a custom made pipeline analysis tool for metatranscriptomic
analysis.
Results and Discussion
From metatranscriptomic (cDNA) and metagenomic (gDNA) comparison we found that the
metabolically active microorganism are significantly different from the total suggesting that relevant
microorganisms might be a subset of the whole community at a given time and condition. The
custom made analysis pipeline demonstrated to be a useful tool for metabolic comparative analysis
between cDNA and gDNA datasets. We could identify different highly and lowly expressed
metabolisms (cDNA level vs. gDNA level) for each agricultural management. We found that
diversity at taxonomic level is much higher than at metabolic level, probably meaning that a
metabolic redundancy occurs among different species. Microbial communities showed to be
different under different agricultural managements at metabolic and taxonomic level, but those
differences were not dramatic. We could also identify the species and metabolisms associated to
each agricultural management, to each geographic region and to rhizospheric soil. This is the first
study of soil microbial communities from a genomic perspective done in Argentina and a good
reference for further works. We could start unravelling the mysteries hidden in the complex soil
universe but this is just the first step on a long journey that should be also followed by other scientist
in our country.
28
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
High throughput pyrosequencing and bioinformatics of a multi-extreme
environment.
Nicolás Rascovan1, Santiago Revale1, Estefanía Mancini1, Martin Vazquez1 and María
Eugenia Farías2
1Plataforma de Genómica y Bioinformática, INDEAR, Rosario, Santa Fe,
Argentina.
2
Laboratorio de Investigaciones Microbiológicas de Lagunas Andinas (LIMLA),
Planta Piloto de Procesos Industriales Microbiológicos (PROIMI), CCT, CONICET,
Tucumán, Argentina
Background and Experimental Procedures
The advent of high-throughput DNA sequencing technologies, and particularly
pyrosequencing, has opened a gate for the in depth studies of environmental
microbial communities through metagenomic approaches. In this study, we have
analyzed 34,549 16S rDNA sequences obtained by PCR and 454 sequencing from
microbial communities developing in Laguna Diamante, a multi-extreme
environment (PH=10, high arsenic content and salinity, high altitude and therefore
high UV radiation and low O2 pressure). The sequences where clustered into
Operation Taxonomic Units (OTU) using uclust algorithm at 0.8, 0.9 and 0.97
similarity and representative sequences were taxonomically classified using RDP
classifier algorithm on GreenGenes database. Public datasets from other
environments were analyzed using same procedures to compare with Diamante
results. Bray Curtis distances were calculated based on taxonomic distribution at
phylum level and UPGMA trees were constructed to visualize relatedness between
samples.
Results and Discussion
We found an extremely diverse microbial community developing in environmental
conditions that most of the life on earth could not resist. Taxonomic analysis
showed a markedly predominance of Protobacteria phylum (62% of all sequences),
but Bacteroidetes (13%), Firmicutes (6%) and Verrucomicrobia (4%) where also
considerably abundant (Figure 6). Moreover, although the primers used were not
the best to amplify Archaeas, we could detect a high amount of sequences from
this group (4%). Most of the sequences (90%) could be classified at least to family
level, suggesting that species existing in other environments have developed
strategies to evade extreme conditions. Comparison to other environments have
shown a closer relationship to very distant location such as Baltic Sea than to the
geographically close Socompa Stromatolite located at only 200km and similar
conditions. This is the first microbial characterization of Diamante Lake and
together with other works, an important step toward the understanding of
mechanisms to survive in adverse conditions such as those of ancient earth and
other planets like Mars.
29
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Digitization Project in MACN: the importance of standard protocols
to obtain high quality taxonomic information
Cossi, Paula1,2; Zimicz, Carolina1,2; Luna, María Celeste1; Andón, Noelia N.1,2; Cuadra,
Natalia1,2; Bukowski Loináz, María Belén1,2 and Ramírez, Martín J.1,2
1
2
Museo Argentino de Ciencias Naturales, “Bernardino Rivadavia”, CABA, Buenos Aires, Argentina.
Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina
Global access of biodiversity occurrence data all over the world is possible due to
networks and portals, such as the GBIF and SNDB Data Portals, which integrate
biodiversity data from heterogeneous sources. Therefore, the structure of databases
with biological content is essential to ensure the accurate exchange of information.
Most biological data from specimens housed in natural history museums are still not
digitally recorded, or are registered in non-standard formats. The Digitization Project of
the biological collections of the Argentine Museum of Natural Sciences “Bernardino
Rivadavia” (MACN) began in 2008 and its main purpose is to turn primary data on
paper into digital formats, implementing established data standards, such as Darwin
Core, to provide information easily available to the scientific community. To achieve
this objective primary data contained in files or catalogues is captured using the
application Aurora. Data fields in Aurora are mapped to corresponding terms in
DarwinCore, including taxonomic, temporal and geographic information, and also
collectors and other curatorial details. About 219.000 specimens had been digitized of
all the collections of the Museum, including the Invertebrate, Entomology, Herpetology,
Ornithology, Arachnology, Mastoozology National Collections, and the National
Herbaria of Vascular and Cellular Plants. The digitization project also includes the
recording of geo-referenced species data. Localities are geo-referenced using the
point-radius method taking into account aspects of precision and specificity of the
locality description. About 40% of the specimen records have been geo-referenced at
present. The validation of this data is done using DIVA-GIS programme, in order to
identify and correct plausible errors during the geo-referenced process.
The advantages of implementing standard protocols and field tools are the fast and
easy access to biological information and the quality improvement of the data housed
in the different collections of the museum.
30
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Analysis of variability of Mal de Río Cuarto virus (MRCV) through haplotype networks
1
2
1
Mario Alejandro García , María de la Paz Giménez Pecci , Juan Bautista Cabral , Irma Graciela Laguna
Maurino2, Carlos Hugo J. Vera1
1
UTN FRC
2
INTA IPAVE – CIAP
3
CONICET
2,3
, Fernanda
Analysis of variability through networks
Genetic variability of individuals of the same species can be studied through networks that represent the genetic distances
between them [1]. We studied the case of Mal de Río Cuarto virus (MRCV), defining distance measures between genome
profiles of different individuals and creating a network of haplotypes. Topological properties of the network were analyzed. The
network was explored in two dimensions, forming space-time environments with different levels of granularity and highlighting
the existing profiles (Figure 1).
Figure 1. Network exploration in space and time dimensions
The profiles, called haplotypes (haploid genotypes), have gotten trough electrophoretic analysis of the viral dsRNA genome
segments, performed on samples from 8 host species, in 13 locations, over 13 seasons [2].The electrophoretic profile of
MRCV is represented by a binary string of length 18, which contains the ten known segments of the virus, some of which can
be placed in different positions [3], and two extra genomic bands [4].
The distances between haplotypes were calculated as Hamming distance plus three special functions that depend on the
existent knowledge about the virus.
Finally, the exploration step led to the observation that, in the first crop years tested, the number of haplotypes and the
distance between them was greater than in subsequent crops. A variability indicator was calculated for each environment and
compared with its expected value, confirming the observation made during the examination and concluding that virus
variability decreased after an epidemic occurred during the crop year 1996/97.
Conclusion
The use of networks in the KDD (Knowledge Discovery in Database) process was very successful and managed to highlight
behavior of the object of study that had not been evident so far. Although an AMOVA analysis [5] and also a haplotype
analysis by environments had been performed [3], the difference or distance between the profiles of each environment could
be detected only with the implementation of the haplotype networks.
The main contribution of this case to the KDD process is the proposal of interactive exploration of networks, which turned out
to be intuitive and easy to apply for analysis. In a human-centered process, where the creativity and experience of the analyst
play a key role [6], the proposed process was able to offer a fresh perspective, complementary to the other techniques of
KDD.
Acknowledgements
UTN1219, FONCyT PICT 06-02486, PICT 143-02, INTA AEPV 214012, MinCyT PROTRI 2010.
Reference
1. Posada D., Crandall K. A.: Intraspecific gene genealogies: trees gafting into networks. Trends in Ecology and
Evolution. PubMed, CSA 2001,16:37-45
2. Giménez Pecci M.P., Carpane P., Dagoberto E., Laguna I.G.: Variabilidad del perfil electroforético de los segmentos
genómicos del virus causal del Mal de Río Cuarto del maíz en Argentina. XIII Congreso Latinoamericano de
Fitopatología. VEP-4 2005, Pg.: 562
3. Giménez Pecci M.P., Carpane P., Murua L., Bruno C., Balzarini M., Laguna I.G.: Variabilidad del Mal de Río Cuarto virus
(MRCV) del maíz según frecuencia de haplotipos obtenidos desde perfiles electroforéticos de los segmentos
genómicos. Actas de la Academia Nacional de Ciencias 14 2008, 99-107
4. Giménez Pecci M.P., Laguna I.G., García M.A., Carpane P.: Bandas extragenómicas en el perfil electroforético del
dsRNA de Mal de Río Cuarto virus del maíz (Fijivirus, Reoviridae). Revista Argentina de Microbiología. Supl. 1 2007, Pg
108
5. Giménez Pecci M.P., Bruno C., Balzarini M., Laguna I.G.: Aplicación del análisis de la varianza molecular en datos de
perfiles electroforéticos de segmentos genómicos del Mal de Río Cuarto virus (MRCV) del maíz (Zea mays L.) en
Argentina. Actas de la Academia Nacional de Ciencias 13 2007, 141-152
6. Brachman R.J., Anand T.: The Process of Knowledge Discovery in Databases: A Human-Centered Approach.
Advances in Knowledge Discovery and Data Mining, MIT Press 1996, 37-58
31
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
CoDNaS database:
The conformational diversity of proteins and its relationship with biological properties
Alexander Monzón, Ezequiel Juritz and Gustavo Parisi
Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bs. As. Argentina
Background
Protein native state is better represented by an ensemble of conformers in equilibrium describing the
conformational diversity or dynamism of a protein. Conformational diversity is a key feature to
understand essential properties of proteins like function, enzyme and antibody promiscuity, enzyme
catalytic power, signal transduction, protein-protein recognition and the origin of new functions.
Crystallographic structures of the same protein obtained in different conditions can be considered as
representative conformers of protein native state. This view is supported by the correlation found
between the observed structural diversity determined by NMR experiments and those coming from
different crystallographic structures.
Description
In order to study how biological properties of proteins are associated with the extension of their
conformational diversity, we developed a protein conformational database called CoDNaS (from
Conformational Diversity of the Native State preliminary release [http://codnas.unq.edu.ar]). For this
purpose we recruited the redundant collection of crystallized structures from PDB database and
obtained 9474 monomeric and homo-oligomeric proteins (accounting a total of 40565 structures)
representing putative conformers for each corresponding protein. Using an all vs. all structural
alignment between the corresponding conformers of each protein we defined the extension of
conformational diversity as the maximum RMSD registered. We obtained that the average RMSD
between conformers is 1.33Å and a maximum of 38 Å. By cross linking our proteins with several
databases we recruited a broad spectrum of biological and physical-chemical information (as taxonomy,
GO terms, ligands, mutations, oligomeric state, etc.). Then, using our practical definition of
conformational diversity it is easy to relate its extension with different parameters. For example
proteins crystallized in different conditions such as bound/unbound states, or mutant/wild-type state or
with variations in pH and temperature give averages of conformational diversity as 1.09Å, 5.11Å, 2.64Å
and 5.11Å respectively. Similar correlations have been obtained allowing us to study how
conformational diversity varies with protein function, sequence similarity, taxonomy and cellular
location.
Conclusion
We think that CoDNaS database is a useful tool to relate conformational diversity with different
parameters and properties allowing us to increase our knowledge in such important feature of proteins.
32
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Non extensive statistics generalization of Jensen Shannon divergence for DNA
sequence analysis
Miguel Ré1,2, Pedro Lamberti2
1
Facultad Regional Córdoba, Universidad Tecnológica Nacional, Maestro López y Cruz
Roja Argentina, Ciudad Universitaria, 5010 Córdoba
2
Facultad de Matemática, Astronomía y Física, Universidad Nacional de Córdoba, Haya
de la Torre y Medina Allende, Ciudad Universitaria, 5010 Córdoba
Jensen Shannon Divergence (JSD) is a symmetrized version of Kullback-Leibler
divergence[1]. JSD allows quantifying the difference between probability distributions.
It has been widely applied to analysis of symbolic sequences by comparing the symbol
composition of different subsequences [2]. One advantage of JSD is that it does not
require the symbolic sequence to be mapped to a numerical sequence, which is
necessary for instance in spectral or correlation analyses.
Different extensions of JSD have been proposed to improve the detection of sequences
borders, in particular for DNA sequence analysis[3-4].
Since its original proposal [5], Tsallis entropy has been considered to extend Boltzmann
Gibbs Shannon entropy results and applications. Different JSD Tsallis extentions has
been suggested and its properties analyzed [6-7].
We present here possible extensions of JSD in Tsallis entropy framework and consider
the results obtained when applied to DNA sequence analysis.
1. Kullback S, Leibler R: On information and sufficiency. Ann Math. Stat. 1961,
22: 79-86.
2. Grosse I, Bernaola-Galván P, Carpena P, Román-Roldán R, Oliver J, Stanley H:
Analysis of symbolic sequences using the Jensen-Shannon divergence. Phys.
Rev. E 2002, 65: 041905 1-16. And references therein.
3. Arvey A, Azad R, Raval A, Lawrence J: Detection of genomic islands via
segmental genome heterogeneity. Nucleic Acids Research 2009, 1-12.
4. Thakur V, Azad R, Ramaswamy R: Markov models of genome segmentation.
Phys. Rev. E 2007, 75: 011915 1-10.
5. Tsallis C: Possible Generalization of Boltzmann-Gibbs statistics. J. Stat. Phys.
1988, 52: 479-487.
6. Martins A, Aguiar P, Figueiredo M: Nonextensive generalizations of the
Jensen-Shannon Divergence. arXiv:0804-1653 2008, 1-7.
7. Lamberti P, Majtey A: Non-logarithmic Jensen-Shannon divergence. Phys. A
2003, 329: 81-90
Email: [email protected]
33
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Improving the correlation between experimental and theoretical studies of the interaction
between Zidovudine (and novel derivatives) and human serum albumin
1
1
Ileana del Rosario Tossolini , María Cecilia Gómez
1
Facultad de Ingeniería, Universidad Nacional de Entre Ríos (U.N.E.R), Oro verde, Argentina
Despite Zidovudine (AZT) effectiveness in the treatment of the acquired immunodeficiency
syndrome (AIDS), this drug has significant adverse effects, many of them associated to low plasma
protein binding, including the human serum albumin (HSA). Hence,
obtaining AZT prodrugs, with higher affinity for HSA, is a key strategy to increase the
effectiveness of the drug. In this case, the strategy used was the chemical modification at the 5’-OH
position of the molecule, binding several amino acids, thus producing the AZT derivatives.
HSA is present in the body in its pure form (ASHP) and complexed with fatty acids (ASHFA). Owing
to the fact that both species exhibit different biodistribution, these studies were done so as to
determine the molecular aspects that lead to the different affinities of the AZT derivatives, for both
species of HSA.
In a previous work, in order to design AZT derivatives with increased affinity for both species of
HSA, molecular modeling methodologies, docking and molecular dynamics were applied based on
the crystallographic structures of HSAP and HSAFA (PDB 1BM0 and PDB 3B9L, respectively).
Molecular modeling techniques were used to find the ligands optimized geometries, in their
minimum energy conformations. Then, the docking studies were performed on the HSA primary
binding site. The energy calculations were applied to each complex trajectory obtained through
molecular dynamics, using the MM_PBSA module of AMBER10 package. Although the found
values show evidence of the molecular bases that could lead to the different affinities of the
derivatives for both proteins, these values do not correlate in the desired manner with the
experimental affinity [1]. For this reason, the docking studies were performed again, changing the
values of certain parameters. For example, the genetic algorithm was carried out 40 times in order
to obtain statistically significant results than the previous ones. In this manner, it was possible to
validate some of the complexes obtained before, but new configurations were found, indicating that
it would be interesting to perform molecular dynamics on those complexes.
Another aspect to take into account, with the aim of improving the energy calculations, is to vary the
dielectric constant. The predictions are quite sensitive to the solute dielectric constant, and this
parameter should be carefully determined according to the charge of the protein/ligand binding
interface [2]. On the basis of that analysis, further studies will be focused on changing the mention
parameter so as to obtain a more realistic correlation between the energetic values and the affinity
constants. The energy calculations were performed using the dielectric constants: 1 and 2, showing
a better correlation with the latter value, but it is still necessary to improve the calculations with that
parameter.
References
1. Quevedo MA, Ribone SR, Moroni GN, Briñón MC: Binding to human serum albumin of
zidovudine (AZT) and novel AZT derivatives. Experimental and theoretical analyses. Bioorg
Med Chem 2008, 16:2779-2790.
2. Hou T, Wang J, Li Y, Wang W: Assesing the Performance of the MM/PBSA and MM/GBSA
Methods. The Accuracy of Binding Free Energy Calculations Based on Molecular Dynamics
Simulations. J Chem Inf Model 2011, 51:69-82.
34
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Following the tracks of the trypanosoma cruzi prenilome
Exequiel Porta1, Guillermo Labadie1
1
IQUIR-CONICET (Rosario Chemical Institute), U.N. de Rosario, Suipacha 531, S2002LRK,
Rosario, Argentina. Te: 54-341–4370477 E-mail: [email protected].
Background
The Chagas-Mazza is one of the world's major parasitic diseases affecting the Americas and is
caused by the protozoan parasite Trypanosoma cruzi. If the condition is not treated in time, it
attacks the body's vital organs infecting and causing disabling injuries and a slow deterioration
leading to death. The necessity of developing new chemotherapeutic agents and find new
targets for action because of current drugs are inefficient and only work in the acute stage.
Furthermore prenylation refers to the posttranslational modification of proteins with isoprenyl
anchors. These motifs are often involved in lipid mediating membrane protein, as well as
protein-protein interactions of important cellular proteins. It is known that eukaryotic three
enzymes catalyze the transfer of these lipids. The farnesyl transferase (FT) and geranyl geranyl
transferase type 1 (GGT1) recognize the CAAX motif of the C-terminus of the protein substrate
and place a farnesyl (polyisoprene of 15 carbons) or geranylgeranyl (20 carbon polyisoprene),
respectively in the thiol a cysteine of this motif. The third enzyme, Geranylgeranyltransferase
transferase type 2 (GGT2 or RabGGT) recognizes the complex of proteins Rab GTPases with
specific Rab accessory protein (REP, for its acronym in English) to connect one or two cysteine
geranilgeraniles a more flexible. Due to the extensive study conducted in the search for
inhibitors of farnesyltransferase (FTase-i) as anticancer agents in the pharmaceutical industry,
in particular, the group of Prof. Gelb (Univ. of Washington) conducted a study of these
compounds as antiparasitic agents. They found that the i-FTase enzyme, which inhibit both
human and the parasite, have greater cytotoxicity toward the parasite. This finding has
validated this enzyme as a target for new chemotherapeutic agents, leading different groups
to look for specific inhibitors of the parasitic enzyme. In recent years, there have been
reported in the literature specific enzyme inhibitors wich show antimalarial activity and
tripanomicide. This discovery opened different expectations in the development of new drugs
from that target.
In contrast to what occurs in humans, very little is known about the role of protein prenylation
in parasites. Thus, it is only known less than 10 of these substrate proteins, most of them
belonging to the trypanosomatids. This opens a fertile field of research where the tools of
biological and bioorganic chemistry can provide new points of view for the study of the
parasitic "prenilome".
To study and elucidate the parasitic prenilome is necessary to have a bioinformatic study of
the possible proteins that might be targets of isoprenylation in T. cruzi. In this way we will give
a theoretical framework to the next step: developing the new chemical tools (bioorthogonals
probes and fluorescent probes) necessary to be applicable in proteomics.The cores Software
for bioinformatic analysis approach are the Preps, the PrenBase, the BLAST and T-Coffee. All
are available free online virtual platforms.
Results and Conclusions
Analyzing the entire proteome of T. cruzi (19,906 proteins), it was found a total of 135 proteins
with the capacity of being prenylated. A huge percentage of these proteins may perform vital
functions to the parasite. This work will provide the theoretical framework set to continue
studying the T. cruzi prenilome using chemical and biological tools available for proteomics.
35
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Attacking Mycobacterium Tuberculosis in the dormant phase: A Combination of expression data
with structural druggability and nitrosative stress sensitivity
Leandro G. Radusky1,2*, Lucas A. Defelipe1,2, Marcelo A, Marti1,2 , Adrian G. Turjanski1,2
* to whom correspondence should be sent: [email protected]
1. Departamento de Química Inorgánica, Analítica y Química Física/INQUIMAE-CONICET, Facultad de Ciencias Exactas y Naturales, Universidad de
Buenos Aires, Ciudad Universitaria, Pabellón 2, Buenos Aires, C1428EHA, Argentina.
2. Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Ciudad Universitaria, Pabellón 2,
Buenos Aires, C1428EHA, Argentina
It is estimated that one-third of the world population is infected with Mycobacterium tuberculosis (Mt), resulted in 1.8 million deaths
worldwide. (World Health Organization, 2011) The host immune response to tuberculosis (TB) infection relies in phagocytosis of
the bacilli by the macrophages resulting in the formation of a granuloma which stops bacterial replication. Inside the granuloma the
bacteria faces a particular stressing condition characterized by hypoxia, inducible Nitric Oxide (NO) synthase derived NO and nutrient
deprivation, and in response switches to a non replicative state, usually called the dormancy phase, where it can remain hidden
and alive for decades. Reactivation of latent Mt is a high risk factor for disease development particularly in immunocompromised
individuals. Common treatment of TB involves a long treatment with the front line drugs, isoniazid, rifampicin, pyrazinamide and
ethambutol. However, the emergence of multi and extensively-drug-resistants (MDR and XDR) Mt strains, and the negative drugdrug interactions with certain HIV (or other disease) treatments, show the urgent need for new anti-TB drugs. In the present work we
have performed a proteome scale analysis of Mt potential drug targets specific for the dormant phase. For this sake, for all Mt protein
domains with available structure, we have first the determined their i) sensitivity to RNOS based upon aminoacidic composition
of the active site, ii) pocket druggability using fpocket[1] and different pocket properties. This information was then combined with
essentiality[2-4], off-target and microarray derived data [5] in a target prioritization pipeline. Using all the information cited above
we performed a weighted search using Sensitivity of RNOS, Druggability, Essenciality, Offtargeting against Human targets and
Upregulation in RNOS conditions as criteria for selection. Three new putative targets have been chosen to follow a virtual screening
protocol. (Table 1).
Table 1
Name
N-acetyl-glutamate
dehydrogenase
Putative
phosphotransferase
Possible
umaA
Mycolic
Uniprot
Sensible
to RNOS
Druggable
Essential
Offtarget
Upregulation
in RNOS
semialdehyde
P63562
Yes
Yes
Yes
Yes
Yes
aminoglycoside
Q7D606
Yes
Yes
Yes
Yes
No
Q6MX39
Yes
Yes
Yes
Yes
Yes
Acid
Synthase
Acknowledgements
This work was partially funded by ANPCyT PICT-2010-2805 awarded to AGT and Bunge y Born FBBEI9/10 (2011-2012) to MAM. LR
is a ANPCyT Fellow. LAD is a CONICET Fellow.
References
1. Schmidtke P et al (2010); J Med Chem. 53(15):5858-67
2. Sassetti C.M., et al (2003); PNAS 100 (22) 12989-12994
3. Rengarajan J, et al (2005); PNAS 102(23):8327-32
4. Sassetti C.M, et al (2003); Mol. Microbiol. 48(1), 77–84
5. Voskuil, M.I., et al. (2003); JEM 198 (5) 705-713
36
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Computational, biochemical, and spectroscopic studies of the copper-containing nitrite
reductase from the denitrifier Sinorhizobium meliloti 2011
María Cecilia Gómez1, Felix Martín Ferroni1, Alberto Claudio Rizzi1, Sergio Daniel Dalosto2 and Carlos
Dante Brondino1
1
Departamento de Física, Facultad de Bioquímica y Ciencias Biológicas, Universidad Nacional del Litoral,
Santa Fe, Argentina, S3000ZAA.
2
INTEC, Santa Fe, Argentina, S3000ZAA.
Nitrite reductases are enzymes that catalyze the reduction of nitrite to NO in the denitrification pathway of
the biogeochemical nitrogen cycle [1]. In denitrifying bacteria, this reaction can be catalyzed by two nitrite
reductases, one containing a cd1 heme and the other containing copper. Copper-containing nitrite
reductases (hereafter Nir) present homotrimeric structure (~ 40 kDa/monomer) with two copper atoms per
monomer, one of type 1 (T1Cu, also blue copper) and other of type 2 (T2Cu, also normal copper) (Fig. 1).
Nirs have been classified into two groups according to the UV-vis properties of their T1 centers. Blue Nirs
exhibit a very intense absorption band at ~ 590 nm, whereas green Nirs present two intense absorption
bands at ~ 460 and 600 nm. The coordination around both copper centers is shown in Fig.1b. T1Cu is an
electron transfer center, whereas T2Cu is the catalytic center. The proposed reaction mechanism, which
involves a pseudoazurin as external electron donor (Paz), is schematized in Fig.1.a.
Fig 1- a) Schematic 3D structure of Nir b) Coordination around T1Cu and T2Cu
We recently overexpressed and purified the copper containing nitrite reductase from the denitrifier
Sinorhizobium meliloti 2011 (SmNir) [2]. Sinorhizobium meliloti 2011 is a rhizobia organism which lives
symbiotically in root nodules of legumes widely used in agriculture because of their ability to take dinitrogen
from the atmosphere. We present and discuss the biochemical and spectroscopic properties of SmNir
together with the computational structural model predicted from its amino acid sequence. We also report
computational studies that describe the interaction of both types of copper atoms with their ligands using a
classical force field and classical molecular dynamics. The structure of Nir from Alcaligenes faecalis (pdb
accession number, 1SNRB), which shows a high percentage of identity to SmNir, was used as model. The
force field was addressed using the combination of quantum mechanics (QM) and classical mechanics
(MM) methods known as QM/MM methods [3]. This approach allowed us to model adequately the active
site at the QM level of theory, and the rest of the system with MM. A total of seven residues, two copper
atoms and one water molecules were treated with QM and the rest, including some water molecules from
the solvent, with Amber force field. We discuss the theoretical model in terms of the experimental results.
Acknowledgment
We thank FONCYT, CONICET, and CAID-UNL for financial support.
References
1. Zumft W: Cell biology and molecular basis of denitrification. Microbiol Mol Biol Rev 1997, 61:533616.
2. Ferroni FM, Guerrero SA, Rizzi AC, Brondino CD: Overexpression, purification, and biochemical
and spectroscopic characterization of copper-containing nitrite reductase from Sinorhizobium
meliloti 2011. Study of the interaction of the catalytic copper center with nitrite and NO. J. Inorg.
Biochem 2012, 114:8-14.
3. Vreven T, Byun KS, Komaromi I, Dapprich S, Montgomery JA, Morokuma K, Michael J. Frisch MJ:
Combining quantum mechanics methods with molecular mechanics methods in ONIOM. J Chem
Theory Comput 2006, 2:815–826.
37
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Truncated normal regression models for soil-water characteristic curves
Carolina C. M. Paraı́ba1 , Carlos A. R. Diniz2 , Aline H. N. Maia3
1 PhD Student at Departamento de Estatı́stica, UFSCar, São Carlos, São Paulo, CP 676, Brazil
2 Departamento de Estatı́stica, UFSCar, São Carlos, São Paulo, CP 676, Brazil
3 EMBRAPA - Meio Ambiente, Jaguariúna, São Paulo, CP 69, Brazil
Background
A soil-water characteristic curve (SWCC) is a useful graphical tool which describes the amount of
water remaining in the soil (water volume content) as a function of the soil water tension (matric
potential). SWCCs are important to study the relationship between soil and water, a physical
phenomenon that affects soil use in many different purposes. One common use of these curves is to
indirectly determine the unsaturated hydraulic conductivity, using statistical pore-size distribution
models [1]. A SWCC is usually estimated by nonlinear regression models fitted to data sets obtained
from laboratory experiments or from pedotransfer functions.
Methods
When constructed from laboratory experiments data, the curve is fitted considering pairs, (θ, ψ),
obtained by applying different tensions, ψ, to the a given soil sample, and observing the water
content, θ, remaining in the sample after application of each tension level considered. Thus,
retention curves relate a variable response, θ, with a regressor variable, ψ. However, given the nature
of the SWCC data, it is known that the observed water content at a matric potential will be such
that it is not less than the residual soil-water content, θr , and no more than the saturated soil-water
content, θs , a phenomenon known in statistics as truncation.
The most widely used method for estimating the parameters of a SWCC is the nonlinear least
squares method. Although well established, usual least squares procedures can be highly biased in
the presence of truncation, which can seriously affect the estimated curve and prediction based on it.
As argued in [2], it is important to account for truncation in regression analysis since usual LS
estimators can be biased, inefficient, and inconsistent.
In the present paper, we propose an alternative approach for estimating SWCC based on nonlinear
normal truncated regression models, assuming normal experimental errors and taking into account
the truncated nature of the observed data. The parameters of the curve are estimated by maximum
likelihood method.
Results
Simulation studies are provided to access the quality of estimates for the proposed regression model.
A real data set is analyzed using the proposed methodology. We also provide a comparison study
between the proposed methodology and the usual nonlinear least squares procedure.
References
1. Cornelis WM, Khlosi M, Hartmann R, van Meirvenne M, de Vos B: Comparison of unimodal
analytical expressions for the soil-water retention curve. Soil Science Society of America
Journal 2005, 69:1902-1911.
2. Maddala GS: Limited dependent and qualitative variables in econometrics. Cambridge:
New York, 1983.
38
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Reverse engineering HD-Zip transcriptional regulatory networks (Ft. Information Theory)
Agustín L. Arce1, Matías Capella1, Delfina A. Ré1, Raquel L Chan1, Ariel Chernomoretz2,3
1
Instituto de Agrobiotecnología del Litoral, Universidad Nacional del Litoral, CONICET, Ciudad
Universitaria, 3000, Santa Fe, Argentina
2
Grupo Biología de Sistemas Integrativa, Fundación Instituto Leloir, C.A.B.A., Argentina, C1405BWE
3
Departamento de Física. Facultad de Ciencias Exactas y Naturales (o IFIBA), Universidad de Buenos
Aires, C.A.B.A., Argentina, C1428EHA
Background
HD-Zip proteins constitute a family of plant transcription factors (TFs). It has been reported that
proteins belonging to subfamilies I and II are mainly involved in responses to environmental stimuli,
particularly abiotic stresses. However, the regulatory networks in which these TFs participate are
largely unknown.
In this work the transcriptional regulatory networks of Arabidopsis were reverse engineered employing
algorithms based in information theory using public large scale transcriptomic assays. The results were
analyzed from a functional and evolutionary point of view, focusing on HD-Zip I and II TFs.
Materials and methods
The program ARACNE was used for the network reconstruction. Filtered data consisted of 9618 genes
and 269 microarrays obtained under different abiotic stress treatments (AtGenExpress project,
http://www.weigelworld.org/resources/microarray/AtGenExpress/). As a result, sets of potential direct
targets (named modules of transcriptional activity, MTAs) for each of the 831 TFs were obtained.
Results
The study of the MTAs of the 25 HD-Zip I and II TFs revealed many novel functional characteristics. A
distinctive pattern of expression was found for subsets of genes in roots and shoots for most of the
MTAs (e.g., MTA of AtHB1; Figure 1A). The overlap of the MTAs was more significant for some
phylogenetically closely related genes (Figure 1B), suggesting a degree of functional redundancy in
these cases. The expression correlation of the genes in each MTA under different stresses uncovered a
potential unknown role for most TFs in heat response (e.g., AtHB12 MTA; Figure 1C). A de novo motif
discovery approach on promoters of MTA genes with the support of conditional mutual information
allowed the recognition of a potential interplay with GBF TFs. Functional studies with GO terms on
MTA genes resulted in the association of the TFs with known and new pathways and functions.
Arabidopsis HD-Zip mutants and overexpressors preliminary confirmed the predicted regulatory role
for some TFs on selected genes. Other preliminary experimental data also supports their role in heat
response.
Figure 1
A
B
C
D
A. Gene expression heatmap of AtHB1 MTA. B. Heatmap of pairwise MTAs overlap comparisons
considering all TFs. C. Genes with correlated expression in AtHB12 MTA evaluating different stresses.
D. Representation of statistically enriched GO terms associated with genes of the AtHB12 MTA
39
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Software integration to bioimage management, processing and analysis
J. E. Diaz Zamboni1, L. Bugnon1, E. Paravani1, C. Galetto1, J. Adur1, V. Bessone1, M. Bianchi1, M.
G. Acosta1, S. Laugero1, V. H. Casco1 and M. F. Izaguirre1
1
Laboratorio de Microscopia Aplicada a Estudios Moleculares y Celulares (LAMAE) – Facultad de
Ingeniería – Universidad Nacional de Entre Ríos
Abstract
The last two decades, technological advances regarding to basic research equipment for cell and
molecular biology is astonishing. Between all instruments developed as apparatus to support
research, bioimaging systems are essential tools. Bioimages can be obtained from a wide range of
equipment and associated techniques, from which we can highlight microscopy images (including
all types and modes), electrophoresis gels, hybridization membranes and microarrays. Almost all
available instrumentation comes from a very competitive industry, which produces equipment highly
protected by patents and proprietary software that allows the storage of bioimages in several file
formats. However, the vast majority of the systems do not allow storing the entire experiment
metadata and they neither use standardized and open format to enable users to import from other
software. This lack of standardization makes the use of the applications more complex and
incompatible, and therefore, users tend to move their data across multiple applications and file
formats. Consequently, valuable information is lost in the conversion and/or migration.
LAMAE´s members, as users and developers, have addressed the problem of bioimaging
administration, processing and analysis, working in the implementation of systematic software
integration. The solution is focused on the use of standard and image file formats, and free
software. We have searched through the options: a standard format for easy sharing, a system to
efficiently manage bioimages data and metadata. OME-XML and OME-TIFF were selected to
standardize information; both formats have been created for optical microscopy techniques. The
first format is a text file encoding the image to text, maximizing portability and easy metadata
reading [1,2,3]. OME-TIFF format is an open standard based on the TIFF format, where the
metadata information defined in the schema OME-XML is stored in the header of the TIFF file. Its
capabilities are higher in the rapid access to image data [4]. Both OME-XML and OME-TIFF
headers are extensible and with a structure that make them appropriated to other bioimages. We
have selected the free server OMERO, for the management of bioimages, which was installed in a
desktop computer. Client’s applications can run on any computer and can be access to the server
through the local network. ImageJ was selected for image processing and analysis, because of its
powerful management features of: multiple image formats, development tools capabilities, server
access and functionality OMERO cross-platform [5]. Another software integration activity was the
implementation of a tool to export images from an optical sectioning microscope to OME-TIFF file
format [6,7].
Our future activities on software integration will be related to the development of two imaging
systems: a photodocumentation system for electrophoresis gels and a digitalization system for
scanning electron microscopy. In both cases the images will be stored on the OMERO server and
we will be studying the OME-TIFF format images obtained in such equipment for portability of
image files.
References
1. J. R. Swedlow, I. Goldberg, E. Brauner, and P. K Sorger. “Informatics and quantitative analysis in
biological imaging”. Science 300 (2003) 100-102.
2. I. Goldberg, C. Allan, J.-M. Burel, D. Creager, A. Falconi, H. Hochheiser, J. Johnston, J. Mellen,
P.K. Sorger, and J.R. Swedlow. “The Open Microscopy Environment (OME) Data Model and XML
file: open tools for informatics and quantitative analysis in biological imaging”. Genome Biol. (2005)
6 R47.
3. Open Microscopy Environment. http://www.ome-xml.org/
4. Bioformats. http://www.loci.wisc.edu/ome/ome-tiff.html
5. ImageJ. http://rsbweb.nih.gov/ij/
6. J. E. Diaz-Zamboni. “Software para usuarios de microscopios de desconvolución digital”. Tesis
de grado. Facultad de Ingeniería, Universidad Nacional de Entre Ríos, (2004).
7. J. E. Diaz-Zamboni, J. F. Adur, D. Osella and V. H. Casco. “Software para usuarios de
microscopia de desconvolución digital”. XV Congreso Argentino de Bioingeniería, (2005).
40
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Image analysis to control the roast level of the peanut
1
1
Ignacio Arévalo , Silvia Ojeda
1
FAMAF, Córdoba, Argentina
Motivation
This work deals with automatic control applied to an agro-industrial sector. It responses to the increasing interest of
the peanut industry to improve yield and product quality. Precisely, an online automatic methodology to distinguish
the different roast levels in the peanut roasting process is introduced. This helps to detect failures in the peanut
roasting process so that corrections can be applied if needed. The proposed method improves the methodology
recently presented by Palma, Ojeda and Modesti [3] and it is based on a novel algorithm that use information
provided by optical sensors installed near to the oven where the roasting process is carried out.
Materials
We have a database of 3900 color images of skinless peanut kernels and bulk. These were taken in a simulated
environment and they have a variety of roast levels.
Conclusion
We present a novel algorithm to automatic control of roasting of peanut. The method is based on image processing
techniques and it applies computer technologies. The online automatic new methodology allows to properly
distinguish the desired roast level.
Keyword
Bulk peanut, image processing, computer networks, roast level of peanut.
References
1. BATAL, A.; DALE, N.; CAFÉ, M. Nutrient composition of peanut meal. Journal of Applied Poultry Research,
v. 14, p. 254-257, 2005.
2. SANDERS, T. H. Effects of variety and maturity on lipid class composition of peanut oil. Journal of the
American Oil Chemists' Society. v. 57, n. 1, p. 8-11, 2007.
3. PALMA, J. J., OJEDA, S. M., MODESTI, M. Procesamiento de imágenes industriales: una aplicación al
control del tostado del maní. IJIE Vol 3, Nº2 2011.
41
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Eukaryotic secretory pathway proteins avoid occluded Nglycosylation sequons.
Máximo López Medus, Gabriela Elena Gómez, Lucas Landolfo, Julio Javier
Caramelo*
Fundación Instituto Leloir. IIBBA. Conicet. *Departamente de Química Biológica de Buenos Aires.
Abstract
N-glycosylation is one of the most abundant and drastic posttranslational
modifications. About 25 % of eukaryotic proteins are N-glycosylated when they
enter the secretory pathway. N-glycans are important for the conformational
maturation of glycoproteins and fulfill vital roles in several molecular recognition
processes. This modification takes place on the sidechain of Asn residues within the
context Asn-X-Ser/Thr (N-glycosylation sequon), where X can not be Pro. Even
though all known N-glycans are located on the protein surface, N-glycosylation
takes place before any major protein folding event, when proteins display an
extended conformation. For this reason, it is possible the occupation of sequons
normally buried on the protein structure, which in turn would seriously impair their
folding process. There are two scenarios to avoid this situation: (1) secretory
pathway proteins avoid occluded N-glycosylation sequons or (2) occluded sequons
are not occupied. To answer this, we classified the protein data bank based on
whether proteins belong or not to the eukaryotic secretory pathway. Next, we
analyzed the surface exposition of Asn residues within the sequon context using the
MSMS program. We found that secretory pathway proteins avoid occluded Nglycosylation sequons. Compared with non-secretory pathway proteins, Asn-X-Thr
and Asn-X-Ser sequons are 6 and 3 times less frequent in secretory pathway
proteins, respectively. This strong bias is highly specific, since it is absent in any of
the remaining Ans-X-Y combinations. To generalize this result, we analyze the
solvent exposition of the first residue present in the 400 Y1-X-Y2 combinations.
Interestingly, we found that only N-glycosylation sequons display such a strong
disparity between secretory and non-secretory pathway proteins.
42
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
25S-18S ribosomal nature of the not NOR-associated highly GC-rich heterochromatin of chili peppers
(Capsicum-Solanaceae)
Mauro Grabiele1, Humberto Debat2, Marisel Scaldaferro3, Guillermo Seijo4, Daniel Ducasse2, Eduardo
Moscone3, Dardo Martí1
1
Instituto de Biología Subtropical (IBS-UNaM), Posadas, Misiones, 3300, Argentina
2
Instituto de Fitopatología y Fisiología Vegetal (IFFIVE-INTA), Córdoba, 5119, Argentina
3
Instituto Multidisciplinario de Biología Vegetal (IMBIV-UNC), Córdoba, 5000, Argentina
4
Instituto de Botánica del Nordeste (IBONE-UNNE), Corrientes, 3400, Argentina
Background
Highly GC-rich heterochromatin (Het) is a universal component of the genome of chili peppers [1].
Fluorescent in situ hybridization (FISH) patterns using Arabidopsis and wheat derived ribosomal (rDNA)
probes, embracing rRNA genes and intergenic spacer (IGS) with repetitive blocks segments, revealed a
priori co-localization of rDNA and Het regions in Capsicum which promoted a further characterization of the
25S-18S rDNA unit in the genus [2] and the present work.
Material and methods
To reveal the unambiguous nature of the Het of chili peppers, combined bioinformatics (Database searching,
sequence alignments, primary and secondary structures analysis) molecular cytogenetics (FISH) and
molecular biology (PCR amplification, restriction enzymes assays, cloning and sequencing) approaches
were carried out in 8 taxa representatives of the major lineages of Capsicum.
Results
A definite FISH co-localization pattern of Capsicum derived rDNA genes (18S, 25S, 5.8S) and spacers (ITS,
IGS) probes and Het is exclusive for chili peppers based on x=12. FISH pattern of pCp200/33 probe, a
mutated IGS element likely affecting rDNA transcripton, imitate that of rDNA/Het excluding the active NORs.
Figure 1
Alignment of homologous regions of 25S-18S rDNA derived
probes used in FISH of Capsicum (left) and double FISH in
Capsicum pubescens (right); blue: DAPI stained chromatin;
green signals: 18S rDNA probe (Cf18S-17); red signals: pCp200/33 probe. Arrowheads point out the absence of red
signals in active NORs. Scale bar is 10 µm.
Conclusion
Highly GC-rich Het of chili peppers based on x=12 is formed by tandemly repeated mega satellite DNA
sequences derived from the 25S-18S rDNA entire unit (7.8 kbp); its origin, expression and evolution are
strongly related to its inherent heterochromatic and ribosomal double nature. Its absence in the Het
constitution of taxa with x=13 has evolutionary relevance.
References
1. Moscone EA, Scaldaferro MA, Grabiele M, Cecchini NM, Sánchez García Y, Jarret R, Daviña JR,
Ducasse DA, Barboza GE, Ehrendofer F: The evolution of chili peppers (Capsicum – Solanaceae): a
cytogenetic perspective. Acta Hort 2007, 745:137-169.
2. Grabiele M, Debat HJ, Moscone EA, Ducasse DA: 25S-18S rDNA IGS of Capsicum: molecular
structure and comparison. Pl Syst Evol 2012, 298: 313-321.
43
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Computational Simulation of inclusion ways of Sulfamethoxazole and
Sulfadiazine in Cyclodextrins
Erbes, Luciana A.1
1
Universidad Nacional de Entre Ríos , Facultad de Ingeniería, Oro Verde, Entre
Ríos, Argentina
Introduction
Sulfonamide-Cyclodextrine complexes motivated this research, in first instance
because of their widespread use in the pharmaceutical field, and secondly, for the
lack of works of this issue.
Sulfamethoxazole (SMX) and Sulfadiazine (SDZ) are the sulfonamides selected,
and related to Cyclodextrines (CD), it was chosen the β-cyclodextrine (β-CD).
Theorical studies from each complex were done, using a set of methodologies
(molecular modelling, docking and dynamic) through software systems.
Results
Molecular Modelling
The ligands: SMX and SDZ, were designed using a software called Gabedit. The
receptors: β-CD, Hydroxypropyl- β-CD (HP- β-CD with 3 and 4 hydroxypropyl
groups), and Methyl- β-CD (M- β-CD), were developed using a crystallographic
structure similar than β-CD that was modified.
Molecular Docking
The location of each ligand in the complexes were predicted using Amber software.
In each case, some criteria were defined to be able of select the appropriate
conformations to be analyzed in molecular dynamics.
In every complex (SMX-β-CD, SDZ-β-CD, SMX-HP-β-CD, SDZ-HP-β-CD, SMX-MβCD and SDZ-M-βCD), the result of a conformational search was, in general, one
cluster with a minimum energy.
Molecular Dynamic
Molecular dynamic stages (minimization, heating, equilibration, and production)
were applied to the receptors and the complexes, in an an explicit solvent
enviroment. In each stage, energy and temperature analyses were done and
compared.
Finally, 10 ns were obtained from every receptor and 10 ns from each complex.
Analyses
VMD software, LigPlot software and Amber scripts (H bonds, distances, nearest
waters) were used to verify the location of the ligands in their receptors and the
movements and orientation from the hydroxypropyl groups.
Conclusions
Most of the complexes have their ligand orientated in the same way, where the
benzene nucleus with an amino group is located toward the wide side of the CD.
Just in
SMX-HP-β-CD (4 hydroxypropyl groups) and SDZ-HP-β-CD (3 hydroxypropyl
groups), is backwards.
Also, comparisons were obtained getting as a result some solubility information.
44
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Phylogenetic relationships of Rhinella arenarum β-catenin. A developmental
biology useful model
Hasenahuer M.A.; Galetto C. D.; Casco, V. H. & Izaguirre, M. F.
Laboratorio de Microscopia Aplicada a Estudios Moleculares y Celulares, Facultad de
Ingeniería (Bioingeniería-Bioinformática), Universidad Nacional de Entre Ríos. Ruta
11, Km 10, Oro Verde, Entre Ríos, Argentina.
Corresponding author: M. F. Izaguirre: [email protected]
Background
Rhinella arenarum is a South American toad, widely distributed by Argentina,
Uruguay, Bolivia and southern Brazil. This species has been extensively used in
developmental biology studies in Argentina for more than 50´ years. Recently, we were
able to isolate and sequence a 539 bp fragment of R. arenarum β-catenin cDNA (1).
β-catenin is a vertebrate cytoplasmic protein that, like the Drosophila armadillo product
has two main functions: linking the cadherin cell-adhesion molecules to the
cytoskeleton, and mediating in the wnt/wingless signalling pathway. Thus β-catenin
regulates gene expression by direct interaction with transcription factors belong
Tcf/LEF family. It provides molecular mechanisms for signal transduction from celladhesion components or wnt protein to the nucleus, and thus controlling numerous cell
events, such as growth and development (2, 3).
Study of β-catenin function in non traditional animal models increases the proofs to
understand your evolutionarily role. Therefore, a complete gene sequence Logos and
phylogenetic analysis were tackled. Numerous metazoan and non-metazoan gene and
protein sequences of β-catenin and β-catenin-like were analyzed.
Materials and methods
cDNA sequence Logos were obtained from metazoan (Homo sapiens, Macaca mulatta,
Bos taurus, Gallus gallus, Mus musculus, Xenopus laevis, Xenopus tropicalis, Rhinella,
arenarum, Anolis carolinensis, Danio rerio, Drosophila melanogaster, Pediculus
humanus, Caenorhabditis elegans, Trichoplax adhaerens Amphimedon queenslandica),
and non-metazoan (Arabidopsis thaliana, Volvox carteri, Dictyostelium discoideum.
Using the same species, phylogenetic analysis of protein sequences of β-catenin and βcatenin-like was tackled by PhyML 3.0, with aLRT-SH and bootstrap branch supports.
Results and Conclusions
Logos and protein phylogenetic trees obtained revealed a near evolutionary relationship
of Rhinella arenarum β-catenin homologous with other vertebrate’s homologous genes,
especially the amphibian ones, and interesting relationship with those of non-metazoan
species, supporting the hypothesis that β-catenin pre-exist the metazoan life, integrating
information from genomic, cytoskeletal, plasma membrane and environmental sources.
Acknowledgements
Present work was supported by PID-UNER 6088-1.
45
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Comparison of the ability to predict true linear B-cell epitopes by on-line available
prediction programs
1
2
3
2
1
J. Gabriel Costa , Pablo L. Faccendini , Silvano J. Sferco , Claudia M. Lagier , Iván S. Marcipar
a
Laboratorio de Tecnología Inmunológica, Facultad de Bioquímica y Ciencias Biológicas,
Universidad Nacional del Litoral. Paraje El Pozo. Santa Fe, Argentina.
b
IQUIR, Depto. de Química Analítica, Facultad de Ciencias Bioquímicas y Farmacéuticas,
Universidad Nacional de Rosario. Suipacha 531. Rosario, Argentina.
c
Departamento de Física, Facultad de Bioquímica y Ciencias Biológicas, Universidad Nacional del
Litoral, Paraje El Pozo. Santa Fe, Argentina; and INTEC (CONICET-UNL), Güemes 3450, Santa
Fe, Argentina.
Background:
Several experimental methods have been developed to identify B epitopes from infectious
microorganism proteins. However, these methodologies are long term demanding and quite
expensive. Our work deals with the use of prediction programs to identify useful B cell linear
epitopes to develop immunoassays. Therefore, we have tested 5 free, on-line prediction methods
(AAPPred, ABCpred, Bcepred, BepiPred and Antigenic), widely used for predicting linear epitopes,
using the primary structure of protein as the only input. Each program uses a very different
algorithm.
Methods and Results:
To compare the quality of the predictor methods we have used their positive predictive value
(PPV), i.e. the proportion of the predicted epitopes which are true, experimentally confirmed
epitopes, in relation to all of the epitopes predicted. Eleven proteins which had been whole mapped
experimentally by highly reliable techniques to detect epitopes, were studied. Each program was
run and predicted epitopes were compared with the 65 true epitopes dispayed in the proteins. In
order to identify useful predicted linear epitopes, none supposed true negative set was used.
The confidence intervals of PPV were calculated with at 90% level of significance for each different
prediction procedures. The best PPV were obtained with AAPpred and ABCpred, 69.1% and
62.8% respectively.
We also statistically evaluate the differences between theses PPV values when counting with
paired data. This allowed us studying which program produced a PPV value different from that
calculated for another program, stated with 90% certainty. Then, to monitor the programs prediction
efficiency, we compared the epitope identifying positive prediction value with that obtained when
randomly selecting regions of the molecule under study. Our results indicate that only 2 of the
programs studied predicted epitopes with a statistically significant higher positive prediction value
than a random procedure, these being AAPPred and ABCpred.
Although, we analyzed if the epitopes predicted by the consensus of several programs were more
efficient than those which had been predicted with each program alone or with partial consensus.
But we observed that considering as true epitopes only the consensus regions to several
programs, does not improve PPV value with respect to the results produced by each program
individually.
Conclusion:
We conclude that AAPPred and ABCpred yield the best results, as compared with the other
programs and with a random prediction procedure. We also ascertained that considering the
consensual epitopes predicted by several programs does not improve the prediction positive
predictive value.
46
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Relative mobility of epitopes residues in immunogenic proteins
Marcos Astorga1 , Sebastián Fernández Alberti1,2 , Gustavo Parisi1,2
Universidad Nacional de La Plata, Argentina
2
Universidad Nacional de Quilmes, Argentina
1
Background
The antigen-antibody interaction is based on the recognition by the antibody of particular
antigen residues called epitopes. Understanding the general features that characterize the
epitopes may contribute to the design of specific drugs that hinder the antigen-antibody
complex formation. From this point of view, the knowledge of the main physicochemical,
structural, dynamical and evolutionary properties of the epitopes will allow us to obtain
common features that can be used to develope new methods of epitope recognition.
Methods
We analyzed flexibility and dynamics properties of epitopes residues in 15 complexes
antigen-antibody groups. In order to do that, we have performed vibrational normal
modes analysis using coarse grained elastic network models.
Results and conclution
Our preliminary results reveal a significant decrease in the epitope flexibility once the
antigen-antibody complexes are formed. Similar behaviours are observed for the relative
movilities of the epitopes in the low frequency normal modes. These results represent a
complementary information that can be use in combination with the physicochemical and
structural characterization of the epitopes sites in order to identify potentially epitopes.
1/1
47
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Relationship between divergence of using synonymous codons in host-virus and the
presence of microRNA
Franco Riberi1 , Laura Tardivo1 , Lucia Fazzi2 , Guillermo Biset3 , Daniel Gutson3 , Daniel
Rabinovich2,3
1 Departament
of Computing Sciences, Universidad Nacional de Rı́o Cuarto, Rı́o Cuarto, Córdoba,
Argentina
2 Instituto Biomédico en Retrovirus y SIDA-INBIRS, Buenos Aires, Argentina
3 Fundación para el Desarrollo de la Programación en Ácidos Nucleicos-FuDePAN, Córdoba, Argentina
Background
MicroRNAs (mi RNA) are small RNA that regulates the expression of m RNA in the cells. They can
interfere with viruses replication. In order to do this, it is necessary that the mi RNA recognize genome
target sites and that a pairing between the mi RNA and a fragment of viral mi RNA occurs. This
recognition is more likely if the fragment is not paired (masked) in the secondary structure of viral
m RNA[1]. It is known that the genome of some human viruses has a bias in the use of synonymous codons
(different codons that encode for the same aminoacid) compared with the host even though its replication
would be less efficient[2].
Goal
The aim of the study is to determine if this bias could be the result of evolutionary pressure exerted by
the mi RNA. To achieve this goal massive comparisons should be made (in the order of 10e7 ) between the
recognition of the virus natural genome and the “humanized” genome. The latter may be obtained by
replacing codons in the viral genome, achieving a codon usage ratio similar to the host.
Materials and Methods
For each mi RNAs, the software to be developed will do parallel “sweep” with the natural and humanized
virus sequence. For each possible genome site, this program should determine the number of recognized
nucleotides and whether these sites are available or masked by the secondary structure. When comparing
results in homologous sites it can be determined whether mi RNAs have a differential effect among
different target m RNAs (normal and humanized). The program will be coded using the C++
programming language and licensed under the GPLv3 software licence.
Results
For each mi RNA and genome, a table should be produced that for each position records the matching
m RNA score, both in the original and the humanized sequence. Table 1 shows the shape structure to be
generated.
Original sequence
Humanized sequence
Score original sequence
Score humanized sequence
Position
Matching∗
Masked+
XYZ
Matching†
Masked‡
XYZ
%const=1∗
cFold∗
%const=1+
cFold+
%const=1†
cFold†
%const=1‡
cFold‡
1
aaTTg
CacA
...
...
aaTTg
Maca
...
...
AaTTg
Xaca
...
...
ttAAC
Gtct
...
...
ttMAC
MtcM
...
...
ttYAC
YtcX
...
...
0.44
0.45
0.22
0.24
0.55
0.54
0.11
0.21
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
N
Table 1: Table structure to generate. Where (cFold constAT = 1.25) && (cFold constGC = 0.95).
Also, It will be analyzed whether the results favor the hypothesis of
of bias in codon usage.
mi RNA
selective pressure as a cause
Conclusions and Perspectives
The above presented shows that software can be developed as a tool for massive comparisons for the
interations between mi RNAs and alternative target m RNA, this will be part of a software product called
RNAemo. This will contribute to the development of tools to compare the possible effects of host mi RNA
in intentionally introduced viruses for gene therapy of cancers or genetic diseases. Further studies will
include an estimate of the binding reaction between mi RNA and m RNA[3] free energy.
References
[1] Gareth M. Jenkins and Edward C. Holmes. “The extent of codon usage bias in human RNA viruses
and its evolutionary origin”, 2003.
[2] Ulrike Muckstein, Hakim Tafer. “Thermodynamics of RNA-RNA Interaction”. Institute for Theoretical
Chemistry, University of Vienna.
[3] Zuker, Michael. “Computational Methods for RNA Secondary Structure”.. June 8, 2006.
48
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Distribution of bioactive peptides in NR
Agustina Nardo1, Cristina Añón2 and Gustavo Parisi1
1Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Roque Saenz Pena
182, Bernal B1876BXD, Argentina
2Centro de Investigación y Desarrollo en Criotecnología de Alimentos (CIDCA), Universidad
Nacional de La Plata, La Plata 47 y 116 (1900), Argentina.
Background
Bioactive peptides (BP) are short sequences (3-20 residues) that can be encrypted in food proteins
sequences. BP modulates the biological activity of several human enzymes playing key roles in
different metabolisms such us the regulation of blood pressure, stimulating or suppressing the
action of the immune system, modulating the activity of the nervous system, inhibit the development
of bacteria and fungi among others. The detection of BP in proteins is an important issue in food
biotechnology for the development of functional foods. To study the relationships between
sequence and structure with BP biological activity, in this work we study the structural and
evolutionary occurrence of well characterized BP in the universe of known proteins.
Material and Methods
Using BioPep database, we retrieved 1.662 BP above 5 residues long. Sequence similarity
searches using non-redundat database were performed to search for exact occurrence of each BP.
The dataset obtained contains 80.523 sequences. In order to characterize the sequences with at
least one occurrence of a known BP and to characterize the distribution of peptides we made a
structural assignment using BLAST searches over CATH database. From these searches we
obtained 55.407. For each of these proteins, a template was selected from the CATH database
searches and also, a reference structure characterizing the homologous superfamily for each
protein was retrieved. This reference structure represents the conserved fold for all the retrieved
proteins in the same homologous superfamily. Using the sequence alignment between the template
and the reference structure we mapped the occurrence of each BP in order to explore the structural
and functional distribution in the different structural families found. Phylogenetic trees were also
obtained using Phyml maximum likelihood approach. The statistical significance of the BP
occurrence was evaluated using Slim packet.
Conclusion
The distribution of BP was not homogeneous showing a great variety of organism and phylogenetic
distribution. The distribution of BP in the structural space showed, however, a relative few number
of structural families. It is very interesting to mention that for certain activities, hot spots were found
in the different folds. Also, we found that several BP are associated with regions of structural and
functional importance due to the high sequence relative conservation as is derived from the
phylogenetic analysis. We think that the occurrence of BP hot spots associated with different
activities and folds could contribute to the development of new tools to find BP.
49
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Unraveling the molecular basis of mammalian inner ear evolution: analysis of the outer hair
cell cytoskeleton protein spectrin
Francisco Pisciottano , Belén Elgoyhen , Lucía Franchini
Instituto de Investigaciones en Ingeniería Genética y Biología Molecular (INGEBI), Buenos Aires,
Argentina
In our laboratory we are studying the genetic basis underlying the evolution of the particular
functional capacities of the mammalian inner ear. During the evolution of mammals, the inner ear
went through many important changes that made it different from the hearing organ of other
vertebrates and endowed mammals with unique hearing capacities in the animal kingdom. Among
many changes we can remark the origin of a unique cellular type, the outer hair cell (OHC), which
shows a novel mechanism known as somatic electromotility. This mechanism of mechanic
amplification is an active cochlear amplifier that can increase hearing sensitivity and frequency
selectivity and depends on OHCs length changes mediated by the motor protein prestin. These
length changes are possible due to the particular characteristics of the OHC's lateral wall, which has
a submembranous lateral system, known as cortical lattice. This protein based skeleton consists of
circumferential filaments of actin that are cross-linked with filaments of spectrin. In the cortical
lattice of the mammalian OHCs alphaII-spectrin in found in association with betaV-spectrin, which
indirectly interacts with prestin [1]. We found in previous work that prestin shows strong signatures
of positive selection in the mammalian lineage [2]. Using maximum likelihood methods to test
models of positive selection we aim to reveal which other proteins were involved in shaping the
morphological and functional particularities of the mammalian inner ear.
Our present results suggest that betaV-spectrin has accompanied prestin’s evolutionary trend in the
lineage leading to mammals. Moreover, betaV-spectrin selected sites group in clusters which show
to distribute non-randomly along the protein spanning over specific spectrin domains. Among the
domains that accumulate positive selected amino-acids we find those mediating interaction with
alphaII-spectrin for dimerization and with the adaptor proteins ankyrin, which mediate the
attachment on integral membrane proteins to the spectrin-actin based membrane skeleton. Our work
continues to delineate the genetic bases underlying the evolution of the inner ear in mammals.
References
1. Legendre K, Safieddine S, Küssel-Andermann P, Petit C, El-Amraoui A: αII-βV spectrin bridgesthe
plasma membrane and cortical lattice in the lateral wall of the auditory outer hair cells. J CellSci. 2008,
121:3347-3356.
2. Franchini LF, Elgoyhen AB: Adaptive evolution in mammalian proteins involved in cochlear
outer hair cell electromotility. Mol Phylogenet Evol 2006, 41:622-635.
50
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
A pipeline for structural annotations in bacterial genomes
Lanzarotti Esteban1,2, Defelipe Lucas1,2, Radusky Leandro1,2, Marti Marcelo1,2, Turjanski Adrián1,2
1
Departamento de Química Biológica, FCEN - UBA, Buenos Aires, Argentina
INQUIMAE, CONICET-UBA, Buenos Aires, Argentina
2
In the last 10 years, a lot of work was done in developing software tools for predicting of structural
features from a protein amino acid sequence, like: secondary structure[1], intrinsic disorder[2] and
tertiary structure[3]. Also, a lot of effort was spent in the improvement DNA sequencing technologies,
making possible to obtain many portions of bacterial DNA without waiting any longer[4]. In this work
we present a pipeline to produce annotations of structural properties for sequenced bacterial proteins
and to produce structural models using homology modeling techniques and assesing models using two
quality measures.
1. Rost B. Review: protein secondary structure prediction continues to rise. J Struct Biol. 2001
May- Jun;134(2-3):204-18.
2. He B, Wang K, Liu Y, Xue B, Uversky VN, Dunker AK. Predicting intrinsic disorder in proteins:
an overview. Cell Res. 2009 Aug;19(8):929-49.
3. Martí-Renom MA, Stuart AC, Fiser A, Sánchez R, Melo F, Sali A. Comparative protein structure
modeling of genes and genomes. Annu Rev Biophys Biomol Struct. 2000;29:291-325.
4. Mardis ER. Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet.
2008;9:387-402.
51
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Identification of putative LxCxE motifs targeting the retinoblastoma protein in human
viruses by structure- and sequence-based calculations
Juliana Glavina1, Lucía B. Chemes2, Gonzalo de Prat-Gay2, Ignacio E. Sánchez1
1
Protein Physiology Laboratory, Departamento de Química Biológica, Facultad de Ciencias
Exactas y Naturales, Universidad de Buenos Aires.
2
Protein Structure-Function and Engineering Laboratory. Fundación Instituto Leloir and IIBBACONICET.
Introduction
Many protein functions can be described in terms of “linear sequence motifs” of less than five
function-determining residues. The LxCxE motif interacts with the retinoblastoma tumor
suppressor (Rb), which plays a key role in cell cycle progression. The LxCxE motif was
identified in several proteins from RNA and DNA viruses, suggesting the LxCxE motif may be
present in other viral proteins. We have developed a method to predict the affinity of a sequence
stretch to the retinoblastoma protein using a combination of structure- and sequence-based
calculations.
Methods
Structure-based calculations used FoldX, which is an empirical force field for the prediction of
the stability of proteins and protein complexes [1]. We used the LxCxE-Rb complex structure to
compute a first position specific scoring matrix. Sequence-based calculations used molecular
information theory, which makes use of residue statistics at an alignment of known binding
motifs [2]. We used over 200 sequences of LxCxE motifs from the papillomavirus E7 protein to
compute a second position specific scoring matrix. Finally, we used the new algorithm to scan
all known sequences from human viruses.
Conclusions
The combination of structure-based calculations and sequence-based calculations is able to
reproduce quantitative and semi-quantitative binding experiments from the literature, the
identification of known instances of the LxCxE motif and of novel putative LxCxE motifs. We
discuss the list in the light of the structural and functional properties of the protein containing
each motif.
References
1. Guerois R, Nielsen JE, Serrano L. Predicting changes in the stability of proteins and
protein complexes: a study of more than 1000 mutations. J Mol Biol 2002 , 320: 369-387.
2. Schneider TD, Stormo GD, Gold L, Ehrenfeucht A. Information content of binding sites on
nucleotide sequences. J Mol Biol 1986. 188: 415-431.
Acknowledgements
We acknowledge funding from Agencia Nacional
de Promoción Científica y Tecnológica (PICT 20101052 to I.E.S), Consejo Nacional de Investigaciones Científicas y Técnicas (postdoctoral
fellowship to L.B.C; G.d.P.G., and I.E.S. are CONICET career investigators) and Instituto
Nacional del Cáncer (graduate fellowship to J.G.).
52
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
VISI: a computational program for antiviral strategies comparison
Leandro E. Ramos2 , Pablo M. Oliva2 , Francisco Herrero2 , Daniel Gutson1 , Daniel Rabinovich1,3 ,
Pedro A. Pury2
1 FuDePAN: Fundación para el Desarrollo de la Programación en Ácidos Nucléicos, Córdoba, Argentina, X5002AOO, Duarte Quirós 1752 7A
2 FaMAF, Universidad Nacional de Córdoba, Ciudad Universitaria, Córdoba, Argentina, X5000HUA
3 CNRS: Centro Nacional de Referencia para el SIDA, Facultad de Medicina, UBA, Buenos Aires,
Argentina, C1121ABG
Background
The existence of an increasing number of antiretrovirals and the phenomenon of resistance makes
it suitable to develop programs that help with the election of different therapy options, before
proceeding to in vitro or in vivo trials. For this task, a software to show the evolution over time of
the infection under different therapies would be particularly useful.
To address this issue, we present the Virus Simulator (ViSi) project [http://visi.googlecode.com]. It
models in-silico the temporal evolution of cellular and viral populations involved in HIV infection.
The parameters of interaction among cells and virions are completely configurable to represent both
the action of antiretroviral drugs and the development of drug-resistant strains. Particularly, the
action of reverse transcriptase inhibitors (RTI) and protease inhibitors (PI) are explicitly considered.
The system is composed by an extensible kernel of simulation developed from synthesis and upgrading of known mathematical models [1,2], and a plugin-based extension model. These plugins are
specifically designed to consider combinations and sequences of antiretrovirals applications. Thus,
the system allows the simulation and testing of several drug therapies used in AIDS treatments.
Conclusion
The main features of the system described above are:
• Modular kernel to test different mathematical models of HIV infection.
• Completely configurable to set simulation parameters.
• Extensible through plug-ins to simulate drug effects on infection.
• Programmable interface to test therapies of combinations and sequences of antiretrovirals.
References
1. Denise E. Kirschner and F. G. Webb: Understanding drug resistance for montherapy
treatment of HIV infection, Bull. Math. Biol. 1997 59:763–786.
2. Alan S. Perelson and Patrick W. Nelson: Mathematical analysis of HIV-1 dynamics in
vivo. SIAM 1999 41:3-44.
53
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
FuL
Alejandro Kondrasky1 , Daniel Gutson1 , Carlos Areces1,2
1
FuDePAN: Fundación para el Desarrollo de la Programación en Ácidos Nucleicos, X5002AOO, Córdoba, Argentina.
2
FaMAF: Universidad Nacional de Córdoba, Ciudad Universitaria, X5000HUA, Córdoba, Argentina.
Background
The body of knowledge in biology, particularly in virology and immunology, is increasing in volume and complexity.
This is why it would be useful to have these knowledge represented in a formal language inside a knowledge base.
Subsequently, dierent methodologies for analysis and manipulation could be developed, allowing validity checks to be
performed on conclusions obtained in experiments.
FuDePAN's Logic processor (FuL) (http://ful.googlecode.com) is being developed to organize, interpret, verify
and explore knowledge in molecular biology, applied to virology and immunology in particular. This will help nd
inconsistencies and automatically derive new information. Its main function will be the verication of conclusions
obtained by results from experiments using queries.
Our initial test case will be the following conclusion obtained from experiments done by FuDePAN:
•
Validate the conclusions obtained in the Junin experiment about the temperature-change eects
over the virus secondary structure:
Corroborate that the line of thought that includes the predictions of the eects of febrile state over the Junin
RNA secondary structure, in which it is hypothesized that the temperature increment reduces the production
of nucleoproteins because the hairpin loop in the intergenic region presents dissimilar characteristics when it is
compared on the two ambisense genome strings when the temperature is increased.
FuL has a plug-in architecture, simplifying the inclusion of new kinds of reasoning services. An API will be provided,
which denes the way in which knowledge ows between the plug-ins and FuL's core reasoning engine. An SDK
composed of libraries and tools required for building plug-ins will also be made available.
The kernel of the tool will be composed of a planner that can handle PDDL (Planning Domain Denition Language)
input, and a knowledge manager that will be the interface between the plug-ins registered in that session and the
planner. Via a XML le, it will be possible to register the plug-ins that FuL will utilize in that session and congure
dierent session parameters.
FuL will include a semantic reasoner for Description Logics (DL) as one of the plug-ins, and we will also provide a
knowledge representation language for the virology domain based on DL. This language will allow the development of
an ontology of virology knowledge that will be available for querying during a FuL session.
References
1. Franz Baader, Deborah L. McGuinness, Daniele Nardi, Peter F. Patel-Schneider:
Handbook: Theory, implementation, and applications.
2.
The Description Logic
The Seventh International Planning Competition Description of Participant Planners of the Deterministic Track
, 2011. www.plg.inf.uc3m.es/ipc2011-deterministic/ParticipatingPlanners
3. Daniel Gutson, Agustín March, Maximiliano Combina, Daniel Rabinovich:
Prediction of consequences of the
, 2006. www.fudepan.org.ar/node/71
febrile status on the RNA secondary structure of the Junín Virus
1
54
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
The relation between the divergence of sequence and structure in intrinsically disordered proteins
Nicolás Palopoli1, Juliana Glavina1, Ignacio Enrique Sánchez1
1
Protein Physiology Laboratory, Departamento de Química Biológica, Facultad de Ciencias Exactas y
Naturales, Universidad de Buenos Aires, Argentina
Introduction
It has long been accepted as a general rule that the structural dissimilarity between two globular proteins
increases as their sequences depart from one another1. In recent years we have become aware that most
proteomes have a significant percentage of proteins with intrinsically disordered regions2. Very little information
is available for evolutionary sequence-structure relationships in these regions, although they are known to
display specific sequence patterns and show a non-random structure in spite of being very flexible. Here we
present a computational assessment of the interplay between sequence and structure of intrinsically disordered
regions of proteins.
Methods
We have focused our studies on the Papillomavirus E7 protein family. These proteins usually display an Nterminal disordered domain (E7N) and a C-terminal globular domain (E7C), allowing for a fair comparison of the
relationship between sequence patterns and different structural descriptors. We represent the degree of
sequence conservation through the information content of each site. The intrinsically disordered regions,
possible binding segments, secondary structure and tendency to aggregate were predicted for all E7
sequences using the one-dimensional models of the disordered domain IUPred3, ANCHOR4 and Tango5. The
average solvent accesible surface area of a residue, backbone dihedral angle propensities and radius of
gyration were predicted using ensembles of structural models based on local structural propensities as
implemented in ProtSA6 and Flexible-meccano7.
Results
We have calculated the relationship between the degree of sequence divergence and the variability in different
structural parameters for every pair of E7 proteins in our dataset and for each position in their alignment. We
have assessed the similarities of E7N and E7C by determining how well the observed sequence patterns and
structural features in each domain correlate with different descriptors of their degree of disorder. It seems that
sequence conservation in E7 is not highly dependent on the degree of disorder. In contrast, any two E7N
domains show much higher differences in disorder than the corresponding globular E7C domains at the same
level of sequence identity, in agreement with previous results obtained through in silico simulated mutations8.
Conclusion
We have been able to describe evolutionary sequence-structure relationships for intrinsically disordered regions
of a model protein. These relationships seem to differ from those in globular domains and between regions of a
disordered domain with different functional properties.
Acknowledgements
N.P. is a postdoctoral fellow in the PhasIbeAm project. J.G. is the recipient of an Instituto Nacional del Cáncer
graduate fellowship. I.E.S. is a CONICET researcher.
References
1
Chothia C, Lesk AM. EMBO J. 1986, 5(4):823-6.
2
Oldfield CJ, Cheng Y, Cortese MS, Brown CJ, Uversky VN, Dunker AK. Biochemistry. 2005, 44(6):1989-2000.
3
Dosztányi Z, Csizmok V, Tompa P, Simon I. Bioinformatics. 2005, 21(16):3433-4.
4
Dosztányi Z, Mészáros B, Simon I. Bioinformatics. 2009, 25(20):2745-6.
5
Fernandez-Escamilla AM, Rousseau F, Schymkowitz J, Serrano L. Nat Biotechnol. 2004, 22(10):1302-6
6
Estrada J, Bernadó P, Blackledge M, Sancho J. BMC Bioinformatics. 2009, 10:104.
7
Ozenne V, Bauer F, Salmon L, Huang JR, Jensen MR, Segard S, Bernadó P, Charavay C, Blackledge M.
Bioinformatics. 2012, 28(11):1463-70.
8
Schaefer C, Schlessinger A, Rost B. Bioinformatics. 2010, 26(5):625-31.
55
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
1D model of the pulse wave along the systemic arteries
Saavedra Fresia Cecilia E.1 , Menzaque Fernando E.2
1 Faculty of Exact Sciences and Technology, National University of Tucuman, Tucuman, Argentina
1 Faculty of Biochemistry, Chemistry and Pharmacy, National University of Tucuman, Tucuman, Argentina
2 Faculty of Astronomy, Mathematics and Physics, National University of Cordoba, Cordoba, Argentina
Materials and methods
Blood is the fluid that circulates throughout the body via the circulatory system that consists of the heart
and blood vessels. The blood path describes two complementary circuits, the pulmonary circulation and
the systemic one.
Arteries are responsible in the systemic circulation for carrying oxygenated blood and nutrients to other
parts of the body, organs, tissues and muscles.
The heart is the one that pumps blood throughout the body in consecutive stages. First fills the atria,
then contracts, the valves open and blood enters the ventricles. When full, the ventricles contract and push
blood into the arteries that are thick and elastic vessels.
On each ventricular contraction a loosening of the initial portion of the aorta is caused which propagates
downwards waveform along the systemic arteries.
The aim of this job is to propose a simplified three-dimensional model describing the behavior of the pulse
wave. The one dimensional model takes into account Navier-Stokes equations for a Newtonian fluid to an
elastic artery that describe the motion of the fluid in a given artery, the movement of its walls and the
interaction between the fluid and the walls (momentum and continuity equations).
It is further assumed that arteries are of circular section, the flow is axisymmetric, the velocity profile is
flat and larger arteries form a binary tree of veins containing an incompressible and frictionless Newtonian
fluid.
To solve the proposed nonlinear model the finite difference method for 2-step Lax-Wendroff was used.
Conclusion
The computational results show that the 1D model is the feasible one to determine the flow in large
arteries.
Reference
1. John, LK; Li, J., Dynamics Of The Vascular System, World Scientific Publishing Co. Re. Ltd.,
2004.
2. Keener, J.; Sneyd, J., Mathematical Physiology, Springer-Verlag, New York 1998.
3. Ottensen, J.; Olufsen, M.; Larsen, J., Applied Mathematical Models in Human Physiology,
Society for Industrial and Applied Mathematics 2004.
56
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
One vs One Artificial Neural Network strategy for gene expression multiclass classification
Remón L1, Juárez L1, Arab Cohen D1, Fresno C12, Prato LB3 , Villoria LN3,Fernandez EA12
1 BioScience Data Mining Group, Facultad de Ingeniería, Universidad Católica de Córdoba
2 CONICET.
3 Instituto A.P. de Ciencias Básicas y Aplicadas, Universidad Nacional de Villa Maria
Molecular signatures are sets of genes that could be used to diagnose or classify disease status on subjects. Due to
the need of a great amount of samples and/or the different overlapping characteristics of the classes in the feature
space building a successful diagnostic tool is still a wish [1]. Artificial Neural Networks (ANN) were not extensively
used in gene expression signatures classification, basically because of “the curse of dimensionality problem”, where
the amount of variables (genes) is greater than the number of samples (subjects). ANNs usually solve multiclass
problems by means of setting a large structure with at most as many output neurons as classes exist in the domain.
This implies adjusting a great number of weights, which in essence requires a lot of samples for the algorithm to
converge [2]. By means of a “divide and conquer” strategy one can split a complex problem into several “easier”
problems. One of these strategies is the One vs One classification through binary classifiers. This implies, for K>2
classes, solve K(K-1)/2 binary classification problems. Here we present some preliminary results on solving
multiclass gene expression signature classification through K(K-1)/2 binary ANNs with a voting schema for class
prediction, called OVONN. The proposed methodology was tested on 3 gene expression data bases preprocessed as
in [6]. For each data base those genes showing a standard deviation greater than 95 percent were selected as predictor
variables. In table 1 it is possible to see the performance of the OVONN compared to the traditional ANN approach
with as many output nodes as classes. The models were cross validated by a Leave One Out by Class strategy and the
number of Hidden Units optimized in each case.
Table 1 Percentage Prediction Error Statistics
NCI60 [4]
9 Tumors [5]
11 Tumors [5]
ANN OVONN
ANN OVONN
ANN OVONN
Min
12.5
0.0
25.0
0.0
0.0
0.0
1Q
Median
Mean
3Q
Max
25.0
25.0
42.5
75.0
75.0
0.0
12.5
10.0
12.5
25.0
25.0
25.0
27.1
25.0
37.5
15.6
25
22.9
34.38
37.5
13.6
27.3
22.4
34.1
36.4
2.3
9.1
7.6
9.1
18.2
1Q and 3Q: first and third quartile.
From table 1, it is possible to observe that the traditional approach strongly suffers from the curse of dimensionality.
Meanwhile, our approach out-performed the previous one, requiring fewer samples to reach a stable solution with
good performance. The solutions reached by OVONN were very stable across training data sets. The proposed
approach could bring the ANN to the classification arena again, providing a new competitive classification tool. An R
library will be made available soon.
Keywords: multi-class classification, ANN, OVO.
References
1- Parker, Joel S. et al. Supervised Risk Predictor of Breast Cancer Based on Intrinsic Subtypes, Journal
of Clinical Oncology, 27(8):1160–1167
2- Ou G, Murphey L. Multi-class pattern classification using neural networks, Pattern Recognition,
doi:10.1016/j.patcog.2006.04.041
3- Khan J, Wei J, Ringner M, Saal L, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu C,
Peterson C, Meltzer P: Classification and diagnostic prediction of cancers using gene expression
profiling and artificial neural networks. Nat Med 2001, 7(6):673–679.
4- Staunton JE, Slonim DK, Coller HA, Tamayo P, Angelo MJ, Park J, Scherf U, Lee JK, Reinhold WO,
Weinstein JN, Mesirov JP, Lander ES, Golub TR: Chemosensitivity prediction by transcriptional
profiling. Proc. Natl. Acad. Sci. U.S.A. 2001, 98:10787–10792.
5- Dudoit S, Fridlyand J, Speed TP: Comparison of Discrimination Methods for the Classification of
Tumors Using Gene Expression Data. Journal of the American Statistical Association 2002,97(457):77–
87.
6- Tapia E, Ornella L, Bulacio P, Angelone L, Multiclass classification of microarray data samples with a
reduced number of genes. BMC Bioinformatics, http://dx.doi.org/10.1186/1471-2105-12-59
57
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
SVM Tree with Optimal Multiclass Partition applied to Gene expression signature classification
Pallarol M1, Arab Cohen D1, Fresno C12, Prato, LB3Fernandez EA12
1 BioScience Data Mining Group, Facultad de Ingeniería, Universidad Católica de Córdoba
2 CONICET.
3 Instituto A.P. de Ciencias Básicas y Aplicadas, Universidad Nacional de Villa Maria.
Abstract. Gene expression signatures are currently used to lead cancer therapy [1]. In many situations, they are
expected to successfully diagnose several disease types. However this is not usually possible, because of the need of a
great amount of samples or by the overlapping characteristics of the classes in the feature space. One of the main
tools used for multiclass classification problems is Support Vector Machines (SVM) under the well known OVO and
OVA strategies and, more recently, the tree based approach. Most of the tree based SVM classifiers try to split the
multi-class space, mostly, by some clustering like algorithms into several binary partitions. One of the main
drawbacks of this approach is that the natural class structure is not taken into account. Furthermore, the same SVM
parameterization is used for all partitions in the above mentioned strategies. Here, we applied the SVMTOCP (SVM
tree optimal classification partition) [2], a new splitting methodology for K>2 multi-class problems. It builds a twoclass problem for each node in the tree, by looking for the input class combinations that produce the best SVM
performance in a specific tree node. This implies to solve for node “i”
Li = η ⋅
Ki!
r!(K i − r )!
(1)
binary problems, where η=1(0.5) for K odd (even) and r=[K/2]. Once the best solution, if found, at node “i” r classes
are passed to the child nodes and the process repeated until reaching a leaf. Despite the training phase being time and
computationally expensive, the proposed approach always produces a balanced tree and the original class structure is
preserved. The last property is very important from a Data Mining point of view, because the reached solution allows
to identify which of the class combinations provides soft or hard margin solutions (tree nodes could have different
kernel parameters) and automatically identifies what are the most difficult input classes to split. These are very
important properties for data analysts who need to extract hidden knowledge from a multivariate data base. The
SVMTOCP and the SVM OVO strategies were compared over three gene expression databases to classify tumor
samples. In all cases the SVMTOCP achieves much more “Hard Marging” (HM) solutions and lesser amount of
support vectors (SV) with no statistical difference in performance than the usual OVO approach. Reaching solutions
with less number of SVs and HMs suggests, a more robust classification strategy and fewer samples to achieve
efficient solutions. These findings are very nice properties for genomic applications where the number of samples is
scarce.
Table 1: used data sets characteristics and classification performances for both strategies
SVMTOCP
DB
NCI60 [4]
9 Tumors [5]
SCBR [6]
Instances
61
58
63
#Classes
8 (5,9)
8 (6,9)
4 (8,23)
%PE
**
25
**
0
0
%HMs
**
43
**
36
**
33
SVM OVO
%SV %PE
**
**
**
78
87
58
%HMs
%SV
17
0
95
15
0
96
0
0
67
** p<0.01. #Classes (minimum, maximum samples on classes), HMs: %of Hard Marging solutions, %SV: %of support vectors used
to build the solution, %PE: %Prediction Error. The data sets were preprocessed as in [6], and those genes showing higher variance
were selected.
Keywords: multi-class classification, SVM, Binary Tree.
References
1- Parker, Joel S. et al. Supervised Risk Predictor of Breast Cancer Based on Intrinsic Subtypes, Journal of
Clinical Oncology, 27(8):1160–1167
2- Arab Cohen, D, Fernandez EA. SVMTOCP: A binary tree base SVM approach through optimal multi-class
binarization In 17th Iberoamerican Congress on Pattern Recognition CIARP 2012 Eds: León LA, Déniz, Mejail ME,
Jacobo J.
3- Khan J, Wei J, Ringner M, Saal L, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu C, Peterson C,
Meltzer P: Classification and diagnostic prediction of cancers using gene expression profiling and artificial
neural networks. Nat Med 2001, 7(6):673–679.
4- Staunton JE, Slonim DK, Coller HA, Tamayo P, Angelo MJ, Park J, Scherf U, Lee JK, Reinhold WO,
Weinstein JN, Mesirov JP, Lander ES, Golub TR: Chemosensitivity prediction by transcriptional
profiling. Proc. Natl. Acad. Sci. U.S.A. 2001, 98:10787–10792.
5- Dudoit S, Fridlyand J, Speed TP: Comparison of Discrimination Methods for the Classification of
Tumors Using Gene Expression Data. Journal of the American Statistical Association 2002,
97(457):77–87.
6- Tapia E, Ornella L, Bulacio P, Angelone L, Multiclass classification of microarray data samples with a
reduced number of genes. BMC Bioinformatics, http://dx.doi.org/10.1186/1471-2105-12-59
58
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Web-based gene-expression analysis using the plant biology analysis tools: GENEVESTIGATOR
María Gabriela Acosta2; Miguel Ángel Ahumada1; Sergio Luis Lassaga3; Víctor Hugo Casco1,2
1
Cátedra de Biología, FCA-UNER, Ruta 11 km 10, Oro Verde, Argentina.
LAMAE - FI-UNER, Ruta11-km10½. Oro Verde, Argentina.
3
Cátedra de Genética y Mejoramiento Vegetal, FCA-UNER, Ruta 11 km 10, O. Verde, Argentina.
2
Background
The GENEVESTIGATOR microarray database and expression meta-analysis engine was developed to
perform gene-expression analysis. Their results are processed from thousands of systematically annotated
and normalized microarray experiments [1]. With the aim of seeking genes that code receptors kinases
capable of activating AT5G01830, armadillo repeats (ARM-repeat) protein; we have used this
bioinformatics tool to find protein kinases and AT5G01830 co-expression. Additionally, we have analyzed
the gene expression on floral tissue of AT5G01830, under different abiotic stress using GENEVESTIGATOR
perturbations tool, to display the response of genes to a wide multiplicity of conditions.
Results
In the present report, we have used the array ATH1 (22k array, ss7176) to visualize gene expression
across arrays from a pre-selected set of experiments. The main goal was to show expression intensity of a
gene list across the selected arrays. The anatomy tool under the search toolset condition, was able to
quickly find out, how strongly are expressed AT5G01830 in different tissues and under different stress
condition (hormonal and saline). In the present report we have used the co-expression tool, to find out
genes exhibiting expression profile closer to our target gene: AT5G01830 (black spot in Figure 1). By
using this tool, we were able to detect a protein kinase (white spot 1, in Figure 1) as possible candidate to
activate putative E3-ubiquitin ligases, like AT5G01830, of five candidates present in the Arabidopsis
genome. Using GENEVESTIGATOR, we have not detected gene expression in floral stage of development in
normal growth conditions concordantly to our results by sqRT-PCR approach.
Figure 1
Conclusions
GENEVESTIGATOR is a high performance engine of the search for gene expression. The tools
are highly efficient to detect the expression of genes helping to confirm or simplify molecular
biology experiments. In this study, a decreased the number of gene targets encoding protein
kinases (five to one), lowering reagents costs and reducing working time.
Reference
1. Zimmermann P, Hirsch-Hoffmann M, Hennig L and Gruissem W: GENEVESTIGATOR. Arabidopsis
microarray database and analysis toolbox. Plant Physiol 2004, 136: 2621–2632.
59
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Agi4x44.2c: a two-colour Agilent 4x44 Qualtiy Control R library for large
microarray projects
1
1,2
4
2,3
2,3
González GA , Fresno C , Merino G , Llera A , Podhajcer O , Fernández EA
1
1,2
Grupo de Minería de Datos en Biociencias, Facultad de Ingeniería, Univ. Cat. de Córdoba
2
CONICET
3
Laboratorio de Terapia Celular y Molecular, Instituto Leloir
4
Facultad de Ingeniería, UNER
Microarrays remain the most accepted tool in transcriptome studies that require to analyze a large number of
samples [1,2] and in several prospective international efforts, in particular in cancer, that are currently running. One
of the biggest challenges of these large-scale studies lies on the simultaneous evaluation of hundreds of arrays for
quality control (QC) to discard, semi automatically, those that do not meet the minimum quality requirements.
Currently, the most used tools for quality control of two-colour Agilent 4x44 microarrays are Feature Extraction (FE)
[3] and QC Chart Tool [5], both developed by Agilent. FE creates a PDF report for each array containing several
measures of quality, and then users must manually review each report to find problematic arrays resulting in a very
time consuming task. On the other hand QC Chart Tool only shows line graphs for FE quality metrics, not allowing to
explore intensities distribution, spatial patterns, etc. Here we present Agi4x44.2c, a new QC R library [4] that fulfills
all mentioned limitations. It facilitates global quality control allowing users to quickly compare all arrays at a glance.
Furthermore, unlike other Bioconductor packages (such as Agi4x44PreProcess [6] and arrayQualityMetrics [7]),
Agi4x44.2c includes QC tools specific to the two-color Agilent 4x44 platform, for more complete and comprehensive
analysis.
Figure 1. Some of the plots created by Agi4x44.2c. (a) False color image of raw intensities in the green channel.
Abnormal patterns can be seen in the first 3 chips. Boxplot (b) and M vs A plot (d) show that the fourth array is
different from the other three. Metrics plot (c) shows a summary of Agilent quality metrics from PDF reports. In this
case we can see that the first three chips (F1-F3) are problematic ones.
References
[1]Yu J, Yu J, Cordero KE, et al. A transcriptional fingerprint of estrogen in human breast cancer predicts patient
survival. Neoplasia. 2008;10(1):79–88.
[2]Curtis C, Shah OP, et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel
subgroups. Nature 2012. DOI: 10.1038.
[3] Agilent. Agilent Feature Extraction Reference Guide, 2007.
[4]R Development Core Team. R: A language and environment for statistical computing. R Foundation for
Statistical Computing. Vienna, Austria, 2005. ISBN 3-900051-07-0.
[5] Agilent QC Chart Tool. http://www.genomics.agilent.com/files/Manual/G4460-90022_QC_Chart_User.pdf
[6]Lopez-Romero P. Agi4x44PreProcess: PreProcessing of Agilent 4x44 Array Data. 2011.
http://www.bioconductor.org/packages/release/bioc/html/Agi4x44PreProcess.html
[7]
Kaumann
A,
Huber
W.
Quality
assessment
with
arrayQualityMetrics.
2009.
http://bioc.ism.ac.jp/2.4/bioc/vignettes/arrayQualityMetrics/inst/doc/arrayQualityMetrics.pdf
60
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
DIGESuite: a Cytoscape plug-in for 2D-DIGE analysis
1
1
1,2
1
3
5
3
2,4
1,2
Talesnik T , Mishima JM , Fresno C , Semrik M , Ribero G , Merino G , Laura B. Prato , Llera AS , Fernández EA
1
Grupo de Minería de Datos en Biociencias, Facultad de Ingeniería, Univ. Católica de Córdoba
2
CONICET, Argentina
3
Instituto Académico Pedagógico de Ciencias Básicas y Aplicadas, Universidad Nacional de Villa María
4
Laboratorio de Terapia Molecular y Celular, Fundación Instituto Leloir, Buenos Aires.
5
Facultad de Ingeniería, UNER
Background: biomedical companies usually offer proprietary black-box software associated with their machinery. In this context,
the user cannot check whether the data cope with model assumptions, in order to apply alternative approaches. Furthermore,
user-interfaces are very restricted not allowing the user to extend the analysis. This is not the exception in GE Healthcare
software Decyder® for two-dimensional difference gel electrophoresis (2D-DIGE) [1]; where a global view of the state of a
proteome can be obtained, by the examination of up to three labeled samples on a two-dimensional gel. The aim of this
technology is the detection of spots (with a priori unknown proteins), showing a statistical expression difference under different
experimental conditions. In this context, is crucial to include proper visualization during pre-processing steps such as spot
filtering and normalization, prior to differential expression analysis. To overcome these limitations we propose DIGESuite, a
plug-in for 2D-DIGE protein expression analysis, to extend the well-known bioinformatics’ flexible visualization tool Cytoscape
[2].
B
D
A
E
C
F
Figure 1: DIGESuite screenshots. a) plug-in control panel, b) gels, c) spot boxplots, d) linear-mixed model specification, e) differentially
expressed spot selection, f) R console
Methods: the plug-in uses a client-server topology, where Cytoscape offers the graphical front end and the statistical engine R
[3] works as back-end. Decyder® raw/normalized volume images data files are displayed for each gel image using Cytoscape
capabilities (see Figure 1). The user can easily filter problematic spots (saturated or dusty ones), check for protein spots
distribution using boxplots and use normalization alternatives such as two-stage linear mixed models [4]. Even if necessary, the
user can open an R terminal to tune the data at his will. Once normalized, a user friendly interface lets the user specified the
linear or mixed model for differential expression analysis instead of Decyder® one/two way ANOVA. Automatically, differentially
expressed spots are highlighted in the gel, according to user significance threshold (raw/adjusted p-values and/or fold-change).
It also provides a complete report of the processed steps applied, as well as the location of the spots to pick from MS protein
identification. Furthermore, additional Cytoscape plug-ins can be included in the analysis according to the user’s needs.
Conclusion: as far as we know, there is no available free tool that allows the analysis of protein data in a consistent and flexible
manner. DIGESuite can be used after Decyder® image analysis has been carried, allowing flexible filtering, normalization,
differential expression analysis and spot information for MS protein identification. This tool can also be used jointly with other
Cytoscape plug-ins to further extend protein expression analysis.
References
[1] Viswanathan S, Unlü M, Minden JS: Two-dimensional difference gel electrophoresis. Nat Protoc. 2006, 3:1351-8.
[2] Shannon P, et. al: Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks.
Genome Res. 2003, 13: 2498-2504
[3] R Development Core Team: R: A language and environment for statistical computing, R Foundation for Statistical
Computing, Vienna, Austria, 2009, ISBN 3-900051-07-0
[4] Fernández EA, et al.: Improving 2D-DIGE protein expression analysis by two-stage linear mixed models: assessing
experimental effects in a melanoma cell study. Bioinformatics 2008, (23):2706-2712
61
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Characterization of long interspersed non- LTR elements in section Arachis
1
1
1,2
1,2
Sebastián Samoluk , Diego Carisimo , Germán Robledo , Guillermo Seijo
1
Instituto de Botánica del Nordeste, Corrientes, Corrientes, CP 3400, Argentina
2
Facultad de Ciencias Exactas y Naturales y Agrimensura (Universidad Nacional del Nordeste),
Corrientes, Corrientes, CP 3400, Argentina
Abstract
Section Arachis (genus Arachis, Leguminosae) is composed of 29 wild diploid species
belonging to five different genomes (A, B, D, F y K) and two allotetraploid species (AABB).
Experiments based on molecular mapping and genome in situ hybridization suggested that
changes in the repetitive fractions may have been a main force leading to genomic
differentiation in Arachis. To test this hypothesis, degenerate primers were designed to isolate
and characterize a conserved region of the reverse transcriptase gene from long interspersed
non-LTR elements (LINEs) from eight species representing five different genomes of Arachis.
The 37 isolated clones showed the conserved amino acid motifs characteristic of the reverse
transcriptase of LINEs. These sequences were compared by the pairwaise method and a
Neighbour- Joining tree was constructed using the program MEGA, version 5 [1]. Even though
the alignment of nucleotides showed a high interspecific nucleotide divergence, the deduced
amino acid sequences evidenced high percentages of similarity. Nineteen sequences had stop
codons and the introduction of frameshifts in the reading frame of some sequences was
necessary to optimize the alignment. Amino acid sequences from other angiosperms and
gymnosperms with homology to the reverse transcriptase of the LINEs isolated from Arachis
were recovered from public databases using BLASTx tool [2] and incorporated to the tree. The
topology of the tree showed that the sequences isolated from Arachis were grouped into a
unique cluster without species-specific subclusters. On the other hand, all the recovered
sequences from public databases (which included some from legume species) grouped into
another well separated cluster. The sequences grouped in the latter cluster had much deeper
branches than those observed in the cluster of Arachis sequences.
Conclusion
From these results, we concluded that the diversification of LINEs is relatively recent in Arachis
and that it may have occurred before the differentiation of the genomic groups present in
section Arachis.
References
1.Tamura K, Peterson D, Peterson N, Stecher G, Nei M, and Kumar S (2011) MEGA5:
Molecular Evolutionary Genetics Analysis using Maximum Likelihood, Evolutionary
Distance, and Maximum Parsimony Methods. Mol. Biol. Evol. 28: 2731-2739
2.Altschul, S.F., W Gish, W Miller, E Myers & D J Lipman (1990) "Basic local alignment
search tool" J. Mol. Biol. 215:403-410
62
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
MSA2MI: A server to calculate and visualize mutual information in multiple sequence alignments
Franco Simonetti1, Morten Nielsen2 and Cristina Marino Buslje1.
1
Fundación Instituto Leloir. CABA. Argentina. 2Center for Biological Sequence Analysis. DTU. Denmark.
Background
Multiple Sequence Alignments (MSAs) of homologous proteins carry at least two levels of information. One is given by
the amino acid frequencies observed at each position of the MSA, and the other is given by the relationship between two
or more positions. The first is known as conservation and the second can be studied in terms of co-variation between
positions. The extent of the mutual co-evolutionary relationship between two positions in a protein family can be
estimated using Mutual Information (MI) [1]. An algorithm was developed for this task by Marino Buslje et al [2] and is
here made publicly available as a web tool.
Results
We present a web toolkit that allows users to calculate and visualize the MI between residues in an MSA. The web
service was developed using PHP on the server side with Javascript and Flash on the client-side. The pipeline was
implemented as modules, making addition of new features easy. The main task is to calculate the MI between all pairs of
columns in the MSA. The output is displayed as a MI network using Cytoscape Web [3], where each node corresponds
to a column in the MSA and edges between nodes represent significant MI values [2] (Figure 1). Several parameters can
be set in order to calculate and present the data. For example, if the structure of the protein is known, structural data can
be displayed by adding the PDB numbering schema to the nodes and distance information for edges (Figure 1). Also,
node coloring can be set to match different attributes, such as conservation value (Figure 1). Additionally, by clicking
each node the relative frequency of different amino acids for this position is shown. Results can be downloaded for
further user manipulation, which include MI and conservation data in raw format and network files to load on
Cytoscape's desktop version.
Conclusions
This web toolkit allows the study of protein families through a simple and interactive interface, utilizing sequence based
data such as conservation, coevolution and amino acid composition and capable of mapping structural data when
available.
Available at: www.leloir.org.ar/MSA2MI
Figure 1. Mutual Information network rendered using Cytoscape Web. Node color represents the conservation
value (red to blue, higher to lower score). Mutual Information edges are shown as solid lines while distance
edges are shown as dashed lines. The right panel displays information about the nodes and edges selected.
Filters can be applied to desired value. The network layout can be modified changing distance and MI threshold
values and its output exported in different file formats.
[1] L. C. Martin, G. B. Gloor, S. D. Dunn, and L. M. Wahl, “Using information theory to search for co-evolving residues in proteins.,” Bioinformatics (Oxford, England), vol. 21,
no. 22, pp. 4116-24, Nov. 2005.
[2] C. M. Buslje, J. Santos, J. M. Delfino, and M. Nielsen, “Correction for phylogeny, small number of observations and data redundancy improves the identification of
coevolving amino acid pairs using mutual information.,” Bioinformatics (Oxford, England), vol. 25, no. 9, pp. 1125-31, May 2009.
[3] C. T. Lopes, M. Franz, F. Kazi, S. L. Donaldson, Q. Morris, and G. D. Bader, “Cytoscape Web: an interactive web-based network browser.,” Bioinformatics (Oxford,
England), vol. 26, no. 18, pp. 2347-8, Sep. 2010.
63
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
14-3-3 isoforms subfunctionalization revealed by systems biology
analysis of cross-talk between phosphorylation and lysine
acetylation
Marina Uhart, Diego M Bustos
Laboratorio de Biología Estructural y Celular de Modicaciones post-traducción. INTECH, Int. Marino
Km 8.2 - Chascomus - Argentina
Advances in quantitative mass spectrometry-based proteomics now enables the system-wide characterization of signaling events at the level of post-translational modications, protein-protein interactions and
changes in protein expression. The 14-3-3 proteins interact with more than 800 dierent proteins, in part as
the result of their specic phospho-serine/phospho-threonine binding activity (RSXpS/TXP, RXXXpS/TXP
and pS/T-X(1-2)-COOH). The family is composed by 2 paralogs in yeast, 7 in mammals, and up to 15 in
plants. Upon binding to 14-3-3, the stability, subcellular localization and/or catalytic activity of the ligands
are modied. 14-3-3 can hide intrinsic localization motifs, prevent molecular interactions and/or modulate
the accessibility of a target protein to modifying enzymes such as kinases, phosphatases or proteases. The
extraordinarily high sequence conservation between 14-3-3 protein isoforms poses a signicant technological
challenge to researchers working with this family. A systems-level approach is necessary to map 14-3-3 network's components and to understand their functions. We used dierent databases to create a PPI (proteinprotein interaction) network for 14-3-3 signaling in human cells. We also added kinases and their substrates
published in the HPRD database for human cells, including the information about the phosphorylation- and
Lys acetylation sites. Finally we transformed this unidirectional network of ~5000 nodes in a directed one,
obtaining a complete representation at high resolution of the 14-3-3 binding partners and their modications.
Using a computational system approach we found that networks of each isoform are statistically dierent
(Jaccard index < 0.25) and built by dierent set of 3-nodes motifs (p < 0.005), with dierent structural
stability. A feed-forward loop motif (# 7, SSS=1) is present in gamma, zeta and eta networks. This motif
has been detected within the transcription-regulation networks of E. coli and S. cerevisiae. At the level of
signal transduction networks, this motif could represent the scaold function, where a protein (in this case
14-3-3) facilitates the interaction between two other proteins (one of them regulates the other one). Another
feature that shows dierences between each isoform specic network is the intrinsic disorder content (p =
-09
2.044e
Krustal-Walis test), promoting distinct levels of wired interactome. This dierence in the percent-
age of disorder is reected in the size, number and co-appearance of domains and domains clubs in each
partner of 14-3-3 network isoforms, suggesting their participation in dierent signaling pathways.
It was
remarkable to found that Tyr was the most phosphorylable amino acid in domains of 14-3-3 epsilon partners.
This, together with the over-representation of SH3 and Tyr_Kinase domains suggest that epsilon could be
involved in growth factors receptors signaling pathways. Finally, we found that within zeta's network, the
number of acetylated partners is signicantly higher (Fisher exact test) compared with each of the other
isoforms, with p values from 1.65e
-10
for sigma, the less similar, to 0.0024 for gamma, the most similar to
zeta isoform. The number of acetylated Lys is not proportional to the domain number (or number of amino
acids in domains). In the case of zeta isoform, the domains of its partners contain more modied Lys than
all 14-3-3 paralogs. Also, an analysis of the subcellular localization of those zeta partners that are acetylated
(48%) shows that 42% are mainly nuclear, containing the 60% of all N uclear Localization S ignals present in
-06
partners of this isoform (p = 1.288e
, Fisher exact test). The Lys acetylation correlates with pTyr but not
with pSer or pThr, suggesting a crosstalk between these two kinds of PTM. Our analysis also shows a clear
subfunctionalization in members of the 14-3-3 family by dierential PTMs.
1
64
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Theoretical studies of membranes at different thermotropic phases in salts solutions by molecular dynamics
F ernando
E. Herrera 1 , M. de los Milagros Sales1 , Daniel E. Rodrigues1,2
1 Área de Modelado Molecular, Laboratorio de Biomembranas, Departamento de Física, Facultad de Bioquímica y Ciencias Biológicas, Universidad Nacional del Litoral, Santa Fe, Argentina. 2
INTEC, (UNL+CONICET), Argentina.
Biological membranes are very complex systems since their structure and dynamic characteristics are affected by different conditions such as temperature or the ionic composition of aqueous buffers around them. The temperature determines the thermotropic phases and ordering of the lipids, like the ordered Gel (G) or the Liquid crystalline (LC). The ionic concentration on the other hand affects the membrane fluidity. Therefore, theoretical studies of the interplay between these two factors are necessary to understand the molecular mechanisms of their interactions. Molecular dynamics have proven to be a reliable tool to study biomembranes in detail. In this context, we have performed Molecular Dynamics simulations of DPPC (Dipalmitoylphosphatidylcholine) hydrated bilayers at Gel (T=22°C) and Liquid Crystalline (T=50°C) phases, at different ionic concentration of NaCl in order to rationalize the effect of the ionic forces on different thermotropic phases of the same system. In this work, we have developed several tools to analyze in detail the structural and dynamical properties affected by the ionic concentration (area per lipid, atomic density profiles, thickness fluctuations 2D­maps, ion depth profiles, ion solvation depth profiles, diffusion coefficients, etc).
The results, that are in agreement with previous reports, show that the ionic absorption and the effects of the ions on many membrane properties depend primarily on the phase of the membranes. The area per lipid and the diffusion coefficients in the LC phase is reduced when the ionic concentration increase while they remain unchanged in G phase. Additionally, it was found that the bilayer thickness in LC phase increase with the salt concentration. Furthermore, the absorbed Na ions interact principally with the carbonyl oxigens in both phases (see Figure 1). This work has finally contributed to emphasize that salt concentration and temperature are important factors to take into account in the design of any kind of experiments.
Figure 1: Lipid interactions in both thermotropic phases.
65
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
The Comparisons of Sequences with the Nucleotide Database (NCBI) and the BLAST
tool. What information we can obtain?
Victoria E. Firmenich, M. Eugenia Fernández Feijóo and María B. Espinosa. Fundación
PROSAMA, CONICET. Paysandú 752, C1405ANH. Ciudad Autónoma de Buenos Aires,
Argentina.
The National Center for Biotechnology Information (NCBI from USA) provides a
database "on line" that comprises nucleotide sequences from DNA of genes from microbes,
plants, humans and several species used as reference. We conducted sequences analysis
using the Basic Local Alignment Search Tool (BLAST). The BLAST tool allows the
analysis of a nucleotide sequence obtained from PCR products. The data obtained by
sequencing from specific amplifications are analysed by BLAST. We performed studies of
AMELX and Vkorc1 using this tool. The genomic DNA used as template was prepared from
tissue samples from small mammals species (Akodon azarae, Lagostomus maximus and
Mus musculus). In this work we describe this methodology useful to assess nucleotide
sequences. After the sequencing of a PCR amplification product, the sequence should be in
an archive .doc: …TTAGGTTAGGGCTAAG….a file of letters that represent the four
DNA bases is the nucleotide "query" to find coincidences in the official database. Then we
choose an organism to pursue the sequence alignment. So far are human, rodents, flowering
plants (Arabidopsis thaliana), rice (Oryza sativa) and some others (Pan troglodytes; Danio
rerio; Gallus gallus; Drosophila melanogaster, Apis mellifera and Bos taurus) which
genome it is known and has been assembled on the NCBI databases for BLAST. The
sequence of PCR products for amelogenin gene was obtained from DNA templates of A.
azarae and L. maximus. The Amel sequence from Mus musculus and human was used for
sequences analysis because those are the closer organisms in which the Amel gene it is
described. The BLAST algorithm allowed us to determine that the females from both
species shared an intronic sequence with the human Amelogenin gene (M55418); the
identities minimal was of 73% in sequences of 200 base pair length. The Vkorc1 sequences
obtained for the three exons from wild type mice were compared with the NCBI Reference
Sequence: NT_039433.8 corresponding to Mus musculus strain C57BL/6J chromosome 7.
The sequences length was from 168 to 311 base pair. A maximum of 6 gaps was found in a
3% of the sequences. The identities between the sequence for the vitamin K epoxide
reductase complex from Mus musculus strain C57BL/6J and the wild type was of 87 to
100%. There was no mutation for the Vkorc1 sequence in none of 20 wild rodents analysed.
These analyses show that the Vkorc1 exons sequence is well conserved in wild mice from
population of Buenos Aires in study. The BLAST allowed us to study the amelogenin gene
in species as Akodon azarae and Lagostomus maximus in which Amel was unknown and
also the studies of the Vkorc1 from Mus musculus of local wild populations.
66
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
GOboot: towards a robust SEA analysis
1
2,3
2,3,a
2, 3
4
5
5
Cristóbal Fresno , Andrea S Llera , María R Girotti , María P Valacco , Juan A López , Laura Zingaretti , Laura Prato , Osvaldo
2,3
2,6
7
1,2
Podhajcer , Mónica G Balzarini , Federico Prada and Elmer A Fernández
1
2
3
BioScience Data Mining Group, Catholic University of Córdoba, CONICET, Laboratory of Molecular and Cellular Therapy, Leloir
4
5
Institute, Buenos Aires, National Center for Cardiovascular Research, Madrid, Spain, Instituto A.P. de Ciencias Básicas y
6
7
Aplicadas, Universidad Nacional de Villa Maria, Biometry Laboratory, National University of Córdoba, Institute of Technology,
a
School of Engineering and Sciences, UADE, Buenos Aires, Present address: The Institute of Cancer Research, London, UK
Keywords: Gene Ontology, background selection, Genomics, Proteomics.
Background
Set enrichment analysis (SEA) is the traditionally used approach for Gene Ontology (GO) analysis, due to its trajectory and
availability over commercial and public tools/websites [1-2]. In the GO structure, each term is statistically evaluated at a time
resulting enriched if the observed proportion of differentially expressed proteins/genes differ from the expected when
compared against a background reference (BR). The appropriate BR is difficult to devise and GO results tend to depend on it. In
this sense, terms would result enriched or not according to the BR used. Here, a new method is presented to evaluate the
enrichment robustness of nodes by means of bootstrap perturbations of the used BR. Thus, each node will have a “power
score”, where high stability nodes are candidates to by explored and leaving spurious enriched terms out of the analysis.
Methods
A resampling technique was implemented to provide a stability (power) measure of SEA to evaluate the effectiveness of a given
BR to identify true enriched terms. Simulated BRs were generated by bootstrapping a BR, trying to keep each simulated BR as
close as possible to the length of the original BR (in order to introduce small perturbations in length of both GO members and
BR). The power value was calculated as the percentage of times a term gets enriched, over a high number of simulated BRs. In
this sense, higher power implies greater stability of the term.
DAVID [3] was the chosen tool to test SEA in a proteomic (Girotti et al., unpublished) and three microarray experiments freely
available at Gene Expression Omnibus [4-6] under different BRs: the genome of the specie (BR-I), the chip-gene list (BR-II, if
possible) and a user defined reference (BR-III [7]). The BR-III (but is not restricted to) was the reference used for power
calculation, as it is considered the one which fulfills the statistical assumption. Boxplot of the enriched terms of main GO
category (Biological Process) was plotted, using a Venn-diagram color pattern to contrast enrichment with typical BR selections
(BR-I or BR-II).
Results
In Figure 1 it is possible to see that the power
boxplots of all enriched nodes (in white) are above
40% for most of datasets. Almost all nodes found in
BR-III reached power values above 50%.
Meanwhile, those nodes that appeared enriched by
bootstrapping BR-III and previously found by BR-I or
shared by BR-I & II, showed power values less than
40% in all cases. This suggests that enriched nodes
found by BR-III were highly consistent and
potentially meaningful. These enriched terms were
validated by literature.
BR-I
Figure 1: Biological process power boxplots of bootstrapped
enriched nodes, coded with the overlapping source of the full BR
length (BR-I to BR-III). Notice that “Joint” boxplot (in white)
corresponds to the boxplot of all bootstrapped enriched nodes.
BR-II
BR-III
Discussion
By means of stability analysis it was shown that non-consensus nodes identified only with BR-I and/or BR-II are unstable,
suggesting spurious enrichment. On the contrary, enriched terms found by BR-III showed high power suggesting more
“confidence” (robustness) making these terms good candidates for further exploration. We found that “robust” terms where
biologically relevant to the experimental setting [7]. In this context, the proposed tool provided additional information (power
values) addressing ontology exploration and new unseen terms blurred by the traditional approaches, to assist researchers in
ontology analysis.
References
*1+ P. Khatri, S. Drăghici, Ontological analysis of gene expression data: current tools, limitations, and open problems, Bioinformatics, 21, 3587-3595 (2005)
[2] D. Wei Huang et al. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists, Nucleic Acids Res., 37:1-13
(2009)
[3] I. Rivals, L. Personnaz, L. Taing, M-C. Potier, Enrichment or depletion of a GO category within a class of genes: which test? Bioinformatics, 23, 401-407 (2007)
[4] L. M. Packer et al. Gene expression profiling in melanoma identifies novel downstream effectors of p14ARF, Int. J. Cancer, 121, 784-790 (2007)
[5] A. Spira et al. Effects of cigarette smoke on the human airway epithelial cell transcriptome. Proc. Natl. Acad. Sci. U. S. A., 101, 10143-10148 (2004)
[6] S. McGrath-Morrow et al. Impaired lung homeostasis in neonatal mice exposed to cigarette smoke. Am. J. Respir. Cell. Mol. Biol., 38, 393-400 (2008)
[7] C. Fresno, A. S. Llera, M. R. Girotti, M. P. Valacco, J. A. López, O. L. Podhajcer, M. G. Balzarini, F. Prada, E. A. Fernández, The Multi-Reference Contrast
Method: facilitating set enrichment analysis, Comput. Biol. Med. 42, 188-194 (2012)
67
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Honeybees colony virtual simulation, step 2
Mario Migueles 1, Liesel Gende 2,3,4, Leonardo Defeudis 2, Pablo Macri 1,4, María Churio 3,4,
Martín Eguaras 2,4, Lidia Braunstein 1
1
Instituto de Investigaciones Físicas de Mar del Plata (IFIMAR). Departamento de Física.
FCEyN. Universidad Nacional de Mar del Plata, Mar del Plata, Buenos Aires, Argentina, 7600.
2
Laboratorio de Artrópodos. Departamento de Biología. FCEyN. Universidad Nacional de Mar
del Plata, Mar del Plata, Buenos Aires, Argentina, 7600.
3
Departamento de Química. FCEyN. Universidad Nacional de Mar del Plata, Mar del Plata,
Buenos Aires, Argentina , 7600
4
CONICET, Buenos Aires, Argentina, C1033AAJ
In eusocial insect colonies, as honeybees, many tens of thousands of workers can live
together as a regulated superorganism. These colonies are characterized by division of
labour: specialization of individual workers for particular tasks. Honeybees have developed
collective food acquisition methods to provide themselves with nutrients. They split the food
gathering task into a variety of subtasks performed by different individuals. Foragers search
for food sources, collect food, and transport it to the nest, where it is processed and stored
by other groups of workers. The aim of this work was to develop a software, called BeEp,
which asserts a causal relationship between honeybee’s age, task performance, population
and food balance. We describe a novel multi-agent model (MAMS) that focuses on the
dynamic task selection of honeybees. The behavior of the complete system was directly
reproduced by simulating the actions of the individuals. Our simulation was intended to model
all the important aspects of a bee’s life inside the hive. We assume differentiation among
castes (workers, drones, queen). Queen is the only one who is in charge to deposit their eggs
in empty cells. Workers bees are physiologically and morphologically identical; we emphasize
the differentiation among these according to age and therefore their activity in the colony.
This includes individual development from egg to adult and adult performing tasks such as
brood tending (nursering), storage nectar-pollen (storing), as well as, collection of nectarpolen (foraging). The software also considers transformation of the nectar in honey. Adult
bees of all ages satisfy their energy demands by consuming stored nectar (honey) or by being
fed by other adults. The larvae (brood) must be fed by nurse’s bees. We have simulated
honeybee’s colony of 2000 individuals in 4 frames for the term of 365 days, maintaining
nutrition and population balance. The software consists of multiple parallel programs that run
synchronized: GoReporter, details the statistics of the simulation, BeTV allows following the
simulation in one computer that runs on another computer with better operational
characteristics. BeEp also generates an event monitor, which shows step by step the
progress of the simulation. The software works on a colony simulation performed by multiple
computers in parallel (clusters), this reduces dramatically simulation times. In parallel we
made experiments with real honeybees in mini colonies to validate the simulation. We plan to
utilize our model for additional studies such us: beekeeping epidemics, pollination and honey
production.
Reference
1. Schmickl T, Crailsheim K: TaskSelSim: a model of the self-organization of the division of
labour in honeybees. Mathematical and Computer Modelling of Dynamical Systems 2008, 14
(2):101–125.
68
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
HMMerCTTer: Tailor-made Decision Making for the Semi-automatic Clustering of large Protein
Superfamilies
Hernán Gabriel Bondino1,3 Inti Anabela Pagnuco2 María Victoria Revuelta1 Marcel Brun2 and Arjen ten Have1
1: Laboratorio de Biología Comparativa en Solanáceas, IIB-CONICET-UNMdP, Mar del Plata (7600); 2:
Laboratorio de Procesamiento Digital de Imagenes, FI-UNMdP, Mar del Plata (7600); 3: Advanta Semillas
SAIC Centro de Investigación en Biotecnología, Balcarce (7620)
Keywords
Expert System, Structure-Function Prediction, Function Assignation, Protein Family, Protein Superfamily
Background
The sheer amount of protein sequences derived from public genome sequences provide many opportunities but
also challenges to biologists. Many protein superfamilies appear to consists of various, sometimes unknown,
subfamilies that are often difficult to be distinguished. Computational analyses play an important role in what is
referred to as function assignation but typically require specific biological knowledge, insight in the available
biocomputational tools and heavy computation of large phylogenies. We set out to develop a tool for the
bioinformatics layman that, based on a training set of high quality expert annotation, automatically clusters
superfamily protein sequences into subfamilies. We developed an automatic but user-supervised procedure that
results in a high quality clustering, cluster-specific HMMer profiles and corresponding cut-off threshold values
for reliable sequence identification and clustering. Hence, we refer to this new tool as HMMer Cut-off
Threshold Tool or HMMerCTTer.
Results
HMMerCTTer depends on an expert-provided training set that consists of a phylogeny and the underlying
Multiple Sequence Alignment (MSA). First, HMMerCTTer assigns monophyletic clusters using a ranking
algorithm based on the Silhouette Index with weight correction. The Silhouette Index measures the
compactness and separation of clusters based on the distances provided by the tree. Then, a HMMer profile is
build for each Silhouette-qualified cluster using the user provided MSA. Each cluster-specific HMMer profile
will, theoretically, identify sequences belonging to the same cluster with a high alignment-score, whereas
sequences from other clusters will have significantly lower scores. Sudden drops in alignment-scores are thus
indicative for cut-off thresholds. This does, however, depend on the quality of the tree and corresponding MSA
but also on the variation and conservation observed within and among the different subfamilies of the
superfamily. Hence, the procedure is supervised by the biological expert in order to optimize both sensitivity
and specificity.
In a second step, the sensitivity and the specificity of the HMMer profiles is tested using either the ungapped
sequences from the training set or the corresponding complete proteomes. Based on graphically represented
data, the user either accepts clusters or asks for an iterative refinement. For instance, large clusters with a high
Silhouette Index but nevertheless an in-discriminative HMMer profile, can be re-analysed by means of an
iteration of only the clusters' subtree through the ranking algorithm. This results in smaller and more specific
profiles. Another refinement included deals with clusters that are considered too small, using an iteration
through the HMMer profiling loop. A third refinement is a manual override of the clustering provided by the
ranking algorithm, in order to enable paraphyletic clustering.
The idea of HMMerCTTer was first applied to our recently published plant-ACD superfamily study. In this
study 29 custom-defined HMMer profiles were constructed and manually selected based on a phylogeny of 406
sequences derived from seven complete plant proteomes. The generated profiles were used to screen 17
complete plant proteomes. and yielded a single false positive (829 sequences, 17 complete proteomes) whereas
all real positives were detected (training set, 7 compete proteomes). The automated HMMerCTTer identified a
slightly higher amount of clusters than the manual procedure but the HMMer profiles generated reliable cutoffs. Hence, the same 829 sequence collection and clustering would have been achieved if HMMerCTTer
rather than a time costly expert-analysis would have been applied.
Conclusions
HMMerCTTer provides biologists with an easy and powerful tool for the reliable classification of subfamilies
of superfamilies. Since Nature provides us with infinite scenarios of superfamilies, more benchmarking will be
required in order to further improve HMMerCTTer and to test its general applicability and limits. Currently we
are analysing HMMerCTTer using the highly complex superfamilies of aspartic proteasas, polygalacturonases
and phospolipases C. For the future we foresee the development of a HMMerCTTer based tool for the
supervised annotation of complete proteomes.
69
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Structure-Function Prediction of Highly Variable Sub-sequences of Protein Subfamilies
María Victoria Revuelta, Arjen ten Have
Laboratorio de Biología Comparativa en Solanáceas, IIB-CONICET-UNMdP, Mar del Plata (7600)
Keywords: Structure-Function Prediction, Aspartic Proteinase, Protein Family, Protein Superfamily,
Subfamily or Specificity Determining Sub-sequence
Background
Protein families consist of homologous, often functionally related, proteins that have a similar 3D structure.
Key aspect of protein families is that they contain paralogues, which allows for functional diversification and
the evolution of subfamilies. One of the aims of Structure-Function Prediction studies is the identification of
Subfamily or Specificity Determining Positions (SDPs), sites or residues specific for certain functional
aspects or subfamily classification. The identification of SDPs is a hot topic in Bioinformatics and can be
achieved by various methods based on either evolutionary tracing (ET) or mutual information (MI), both of
which depend on multiple sequence alignments (MSAs) and homology. Interestingly, MSAs also identify
sub-sequences that are not conserved throughout the complete superfamily and, hence, are not truly
homologous. Current ET or MI SDP identification methods do not identify these Subfamily or Specificity
Determining Sub-sequences (SDSs), some of which could be very important for protein function. We set out
to develop methodology for the identification and subsequent analysis of SDSs using A1 Aspartic
Proteinases (APs) as a case study. APs form a well studied protein family with a number of well described,
functionally important loops such as the Nepenthesin-specfic loop and the Plant Specific Insert. The analysis
will be used for functional prediction but also for the foundation of a more general SDS-identification and
analysis procedure.
Results
A multiple sequence alignment of 710 AP sequences from 107 completely sequenced eukaryotic genomes
was constructed based on known hallmarks and available structural information. Non-homologous or
otherwise poorly aligned sub-sequences were removed and a phylogenetic tree was constructed. The tree
shows the existence of eleven different AP subfamilies whereas the MSA trimming identified 12 stretches
with high variability. Six of these were described by Metcalf & Fusek (1993) as variable loops that are
covering the binding cleft, are rather mobile or distorted in structures and are supposedly involved in
substrate specificity. The other six SDSs are more remote form the binding cleft but also appear solvent
exposed.
Once identified, the SDSs require bio-computational analysis. The sub-sequences were analyzed for
length, subfamily conservation and sequence characteristics. The length of each of the 12 highly variable
sub-sequences was determined using a PERL script and analyzed in R in order to find significant differences
between subfamilies. Subfamily conservation was analysed by realignment of the 12 SDS regions for the 11
identified subfamilies. Reliable alignments were obtained for some but not all 131 datasets. Comparison of
reliable cluster-specific SDS-alignments was hampered by a low information content. All sequences were
analysed using a number of bio-computational methods in order to detect putative physicochemical and or
biological fingerprints.
Conclusion
MSA trimming software can be used for the identification of SDSs. Ten out of 12 SDSs identified in the AP
superamiliy show statistically significant differences throughout the superfamily classification. A number of
SDS-cluster alignments are reliable which suggest these SDSs are functionally constrained within certain
subfamilies. Other SDS-cluster alignments are not reliable and require a tree-guided iterative alignment
optimization which is currently being developed. Comparison of SDSs is hampered by lack of clear
homology and alternative strategies are being developed for comparative analysis. Most SDSs are relatively
hydrophilic confirming that SDSs are solvent exposed. A number of Prosite patterns with a high probability
of occurrence was identified and will be statistically analysed.
Reference Metcalf P, Fusek M: T EMBO 1993, 12(4):1293-1302
70
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Molecular Dynamics and Circular Dichroism Study of VBT:VBA Polymers (1:1
and 1:4). Structure and Dynamics comparison.
Sergio A. Garay1, Antonela Fuselli1, Debora Martino1,2, Daniel E. Rodrigues1,2
Facultad de Bioquímica y Ciencias Biológicas - Universidad Nacional del Litoral
2
INTEC(UNL-CONICET)
Santa Fe, Argentina
[email protected]
1
A novel class of environmentally benign, non-toxic and recyclable materials based on vinylbenzyl thymine (VBT) and an
ionically-charged vinylbenzyl triethylammonium chloride (VBA) monomers was studied. This compounds were bioinspired in the nitrogen bases interactions that happened in DNA degenerative processes and their reversion possibility
using specific enzymes from life organisms. The hydrophilic nature of VBA let us work without using organic solvents in
the polymerization process. The technological applications of these polymers has became earlier than the necessary basic
studies which could help to understand their behavior and improve their services.
We present a study of the influence of the co-polymerization molar relationship VBT:VBA on the short distance structure
adopted by the polymers chains. We carried out several Molecular Dynamics simulations of these polymers and also
Circular Dichroism experiments. We run simulations of 32 monomers of VBT:VBA (1:1) and 35 monomers of VBT:VBA
(1:4) in explicit water (SPC model). The polymer 1:1 showed 75 % of its monomers in helix conformation, while the 1:4
only showed 54 %. We detected thymine stacking between residues (i, i+4) and (i,i+5) in the former and latter polymer
respectively. The number of residues in a helix turn was 3.6 and 4.0 for the stoichiometry 1:1 and 1:4 respectively. The
helix structure of the polymer 1:4 was interrupted by longer unstructured segments than in the 1:1, showing also more
undulations of its backbone. The former also showed a higher number of water molecules, solvent accessible surface and H
bond numbers closed to it than the polymer 1:1, indicating that the unstructured segments (in 1:4 polymer) let enter more
water close to it. In all cases we found at least a pair of thymine stacked inside the helix segments, which could explain in
part the helix stability. In summary, we can conclude that thymine piling up would be responsible (at least in part) of the
helix structure of the polymer, helping to lower the SAS of the hydrophobic moiety. The lack of CD signal was in agreement
with the simulation results.
-1-
71
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
LATERAL PRESSURE EFFECTS ON STRUCTURAL PROPERTIES OF DPPC LIPID
BILAYERS IN GEL AND LC PHASES: A MOLECULAR DYNAMICS STUDY
A. Sergio Garay1, Juan F. Quaranta1, Daniel E. Rodrigues1,2
Área de Modelado Molecular, Lab. de Biomembranas, Dpto. de Física, Fac. de Bioquímica y Cs.
Biológicas. 2INTEC (UNL+CONICET). ARGENTINA.
[email protected]
1
Cell membranes contain hundreds of lipid species and proteins arranged in heterogeneous domains.
Nowadays it is known that this compositional and morphological heterogeneity is central to their
functions of substance trafficking and protein interactions. It is therefore necessary to rationalize
how the lateral pressure boundary conditions affect the structure, ordering and dynamics of the lipid
domains. We performed Molecular Dynamics simulations (MD) on hydrated lipid bilayers of DPPC
in Gel(G, T=22°C) and Liquid-crystalline(LC, T=50°C) phases, at several lateral pressure values
(ensemble of constant surface tension, ST) to evaluate its influence in the structural and ordering
properties. For both phases the MD were performed over a bilayer of 480 lipids, at ST values of 14
and 28 dyn/cm, being the former that which reproduces the experimental NMR Deuterium order
parameter profiles of the LC-phase.
One of the relevant structural properties is the area per lipid: Area[G,ST=14dyn/cm]=(44.7+/0.4)Å^2; Area[G,ST=28dyn/cm]=(48.3+/-0.2)Å^2; Area[LC,ST=14dyn/cm]=(61.5+/-0.7)Å^2;
Area[LC,ST=28dyn/cm]=(75.0+/-0.3)Å^2. The results show that the LC-phase is much more
sensible to the change in lateral pressure than the ordered G-phase. The hydration of the lipid polar
groups is known as a relevant contribution to the interface energetic. The number of H-bonds of
water to the carbonyl O per lipid are: HB[G,ST=14dyn/cm]=5.14; HB[G,ST=28dyn/cm]=5.19;
HB[LC,ST=14dyn/cm]=5.69; HB[LC,ST=28dyn/cm]=6.09. It is shown that the larger change is for
the LC-phase. The change in the LC-phase area explains this behavior and also that the number of
water whose orientational potential is perturbed by the interface is more sensitive for this case. We
have also analyzed the changes in the order parameter profiles, the number of water that bridged
among the lipids, and the lateral pressure profiles across the bilayer to untangle the contributions
from different lipid regions.
72
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Assessing protein-disease association significance from candidate
ranking lists
Ariel Berenstein1,2, Irene Ibañez1,2 , Ariel Chernomoretz1,2
1
2
Departamento de Física, Universidad de Buenos Aires, Bs.As. Argentina.
Laboratorio de Bioinformática, Instituto Leloir, Bs. As. Argentina
Background
There has been a lot recent interest in the application of complex network theory in human health
related research in order to predict new disease/gene-product associations. Most of this type of research
programs assumes that protein associated to the same disease have an increased tendency to interact with
each other. Accordingly, most of gene prioritization methods involve the use of: already known genedisease associations, a complex network of interacting proteins that encodes physical or functional
relationships between them, and a kind of information propagation technique used to rank candidate
proteins in terms of their degree of association with disease-related seeds. Usually top ranked proteins are
considered as new candidates, but this procedure does not take into account either, the statistical
significance of the proposed gene-disease association or topological structure effect of P2P
implemented network.
Materials and methods
We considered genes and protein associated to the Alzheimer disease as reported by the DisGenet
database [1].Protein-protein interactions inferred from the Human Interaction Network (HIN)[2], Three
different protein candidate prioritization methods was analyzed (Functional Flow, Random Walk with
Restart and Net Rank Algorithms) [3-6]. We implement a bootstrapping technique taking into account the
topological network structure to assign statistical significance to observed scores, and correct the
corresponding p-values, whit a multiple hypothesis testing technique (FDR).
Results
We show that predictions based on ranking candidate lists obtained by this type of algorithms can be
highly biased by the underlying network topology. We show how and when a bootstrapping technique,that
takes into account the local connectivity pattern of each node, should be used to alleviate this issue.
Conclusions
In this work we highlight the importance of adequately take into consideration the network connectivity
pattern in gene prioritization procedures. Looking at several topological quantities we have analysed the
induced topological bias and quantified the performance of bootstrapping techniques that aim to alleviate
it. We found that different algorithms are differently affected by this bias. This observation can be
explained in terms of the respective information propagation scheme implemented in each algorithm.
1. Bauer-Mehren et al. DisGeNET: a Cytoscape plugin to visualize, integrate, search and analyze gene–
2.
3.
4.
5.
6.
disease networks. Bioinformatics 26: p2924, 2010..
Ceriani et al. Automated Network Analysis Identifies Core Pathways in Glioblastoma PloS One ,5 (2):
e8918, 2010.
Guney et.al. Toward PWAS: discovering pathways associated with human disorders. BMC
Bioinformatics 12(Suppl 11):A12, 2011
Nabieva et al. Whole-proteome prediction of protein function via graph-theoretic analysis of
interaction maps. Bioinformatics 21, p302, 2005.
Kohler et al. Walking the interactome for prioritization of candidate disease genes.Am J Hum Genet,
82, p949 2008.
Chen et al. Disease candidate gene identification and prioritization using protein interaction networks
BMC Bioinformatics, 10, p73, 2009.
73
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Estimation of Species Richness in Microbial Communities
2
Introduction
Data mining concepts combined with statistical
estimation can be applied in metagenomics to
infer species richness in microbial communities.
There are several statistical estimators that infer
species richness in the community from a
sample [1]. In spite of the usually large volume
of data, a paradox occurs because richness
estimators have a poor performance and
underestimate species richness. The reason has
its roots in the low frequency of statistically rare
species, that is, species with one or just a few
members in the population. These rare species
sometimes can constitute the major part of the
community. So that the sample of sequences
drawn from the population can contain
thousands of reads and millions of bases, but
still be insufficient for an adequate estimate. To
improve richness estimation we introduce here
an algorithm for species counting, called ARE,
based on an intelligent-data-analysis approach
combining simulation and machine learning. We
test ARE on a real-world sample of 16S rRNA
sequences.
Material and Methods
The analyses shown in this work were
performed with the gene coding for the 16S
rRNA, which enables a phylogenetic evaluation
of the similitudes and differences between
microorganisms [2]. The sequences were
aligned against a reference and filtered by size
and relative position in the overall alignment.
The remaining sequences were clustered in
OTUs by similarity using the Jukes-Cantor
distance. The similarity threshold was chosen so
that every cluster corresponds approximately to
a different species. The ARE algorithm starts
with a population model based on the richness
and distribution of an initial sample, then it
improves estimation by successively adding
individuals selected by a simulation process.
This simulation takes into account the species
abundance in the initial sample to estimate what
is the probability of the next individual to be a
member of a species already recorded in the
sample or a member of a new one. This
probability is calculated as the quotient between
the number of species with only one member
and the total number of individuals in the
sample, as suggested by Alan Turing and
demonstrated later by Good [3]. The quotient at
any given iteration is:
Tˆi
distribution along the simulation process. In this
context, the probability of finding a new species
tends to zero as the number of simulated
individuals increases, and as a consequence,
the number of recorded species also increases.
To evaluate ARE we analyzed eight samples
from a metagenomic survey of the coastal line of
a hypersaline lake (NCBI Short Read Archive
accession number SRX008158)
Resuls
The simulations were halted after reaching a
threshold of singleton frequency or a given
number of simulated individuals. The number of
resulting species is the richness estimator of the
population. The ARE estimates were higher than
those of other current estimators. Figure 1
compare ARE predictions to those of Chao and
ACE, two common non-parametric estimators.
Figure 1
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Comparación
de Estimaciones
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
11000
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
10000
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
9000
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil 8000
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Riqueza Estimada
1
Cristóbal Santa María , Marcelo Soria
1
UNLAM.San Justo. Argentina
2
FAUBA. Buenos Aires. Argentina
7000
Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil 6000
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
5000
Versión Estudiantil 4000
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
3000
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
2000
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
1000
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
0
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
S85
S86
S87
S88
S89
S90
S91
S92
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Muestras
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión
CHAO
ACE Estudiantil
ARE Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
To test the goodness of fit of the estimator we
created a simulated population that follows a
Fisher´s log-series distribution [4]. A sample was
drawn from this population and used to estimate
population richness using ARE, which confirmed
the improvement in performance obtained with
this new estimator.
Conclusions
The results obtained with the metagenomic data
indicate that ARE yields better estimates of
richness, while the results from the simulated
population confirm, at least from a statistical
point of view, these improvements.
References
1. Chao, A and Lee, S. Estimating the Number of
Classes via Sample Coverage. Journal of American
Statistical Association, 1992, 87(417):210-217.
2. Schloss, P. and Handelsman,J. Toward a census
of bacteria in soil. PLoS Computational Biology,
2006, 2(7): e92.
3. Good, I. The Population Frequencies of Species
and Estimation of Population Parameters.
Biometrika, 1953, 40( 3/4):237-264.
4. Fischer, R. Corbet, S y Williams, C. The Relation
Between the Number of Species and the Number
of Individuals in a Random Sample of an Animal
Population. The Journal of Animal Ecology,1943,
12(1):42-58.
n sgletones
i 1
At every step i the algorithm updates the number
of species present in the sample depending
whether the simulated individual belongs to a
new species or not. In this way, ARE “learns” the
74
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Dissecting relationships between sequence, structure and functions in the Ankyrin
Repeat Protein Family
R. Gonzalo Parra*, Rocío Espada and Diego U. Ferreiro
Protein Physiology Lab, Dpto de Química Biológica, FCEyN-UBA and CONICET. Buenos Aires,
Argentina.
*[email protected]
Repeat proteins are made up of tandem arrays of similar 20~40 amino acid stretches
that usually fold up in elongated structures mainly stabilized by local interactions. Due to their
apparently simple architecture, these proteins constitute useful models to dissect relationships
between sequences, structures and functions. The Ankyrin Repeat Protein family (ARPs) is
widely distributed in nature. A canonical ankyrin repeat consist in a 33 amino-acids length
motif that usually folds into a beta-hairpin-helix-loop-helix upon interaction with its nearest
neighbours. Their biological function is attributed mediating specific protein-protein interactions
with versatility of recognition paralleled to that of antibodies. Thus, their function (or lack of)
plays crucial roles in the developing of various pathological processes and in bacterial or viral
infections.
We have built a relational database to statistically characterize ARPs architecture
at various levels of description. We have collected, depurated and catalogued all available
ARPs sequences, structures and functional data, that delineates the general properties of
this protein family. Usually Hidden Markov Models (HMMs) derived from Multiple Sequence
Alignments (MSAs) are used to detect repeats in protein sequences. This methodology has
many disadvantages as it fails to detect those repeats (or parts of them) with a high degree
of divergence.. We developed a robust scheme to perform structural alignments and detect
symmetries to define the repeating units within a repeat array and between natural protein pairs.
The derived metrics were compared in terms of sequence and structural similarity. We detected
subgroups within the ARPs family that appear to correspond to known functional classes.
We found that the most common methods used to characterize globular protein domains
are insufficient to capture essential characteristics of the ARP family. We hypothesize that this is
due to strong evolutionary divergence in sequences that tolerate insertions and rearrangements
within a repeating array. We show that the divergent regions can usually be mapped to binding
sites. We postulate that the functional constraints imposed by specific binding conflicts with
robust folding of these proteins, and that these signals could be used to inform energetic terms
in folding dynamics models.
75
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Design of novel DNA-binding specificity in proteins from the “zinc finger” family
Benjamin Basanta1, Andreu Alibes2, Luis Serrano2, Alejandro Nadra1
1
Structural Biochemistry Group, Biologic Chemistry Department, Facultad de Ciencias Exactas y
Naturales, Universidad de Buenos Aires, C1428EGA Buenos Aires, Argentina
2
Biologic Systems Design Group, Systems Biology Program, Centre for Genomic Regulation,
08003 Barcelona, España
Protein-DNA interaction has a central role in cellular development, modulating essential
processes such as gene expression, cell cycle, chromatin structure, etc. The “zinc fingers”
structural motif is the most abundant DNA-binding protein domain in mammalian genomes. Each
of these domains binds a zinc ion that provides high structural stability, making it highly tolerant to
mutations and easily evolvable [1].
Development of novel DNA-binding proteins with new specific sequences is a great technological
challenge and has a potential application in many fields, from basic research to synthetic biology
and gene therapy. Currently, there is only one example of a protein successfully redesigned to
bind a DNA sequence different from the wild-type [2] [3]. On the other hand, the naturallyoccurring repertoire of zinc fingers that bind a specific sequence is limited [4] [5] [6].
With the aim of developing new protein-DNA-binding-site pairs, we propose the use of the FoldX
software [7], which allows modeling and prediction of protein-DNA interactions, based in energy
landscape calculations [8]. In this work we present a strategy for computational re-design of
binding interfaces and experimental validation in a one-hybrid system in yeast.
1. Tokuriki N, Tawfik DS: Stability effects of mutations and protein evolvability. Curr
Opin Struct Biol. 2009 5:596-604.
2. Ashworth J, Havranek JJ, Duarte CM, Sussman D, Monnat RJ Jr, Stoddard BL, Baker
D: Computational redesign of endonuclease DNA binding and cleavage specificity.
Nature 2006, 7093:656-9.
3. Ulge UY; Baker DA; Monnat Jr. RJ: Comprehensive computational design of mCrel
homing endonuclease cleavage specificity for genome engineering. Nucleic Acids
Res. 2011, 1:1-10.
4. Maeder ML, Thibodeau-Beganny S, Osiak A, Wright DA, Anthony RM, Eichtinger M,
Jiang T, Foley J01E, Winfrey RJ, Townsend JA, Unger-Wallace E, Sander JD, MüllerLerch F, Fu F, Pearlberg J, Göbel C, Dassie JP, Pruett-Miller SM, Porteus MH, Sgroi DC,
Iafrate AJ, Dobbs D, McCray PB Jr, Cathomen T, Voytas DF, Joung JK: Rapid opensource engineering of customized zinc-finger nucleases for highly efficient gene
modification. Mol. Cell. 2008 2:294-301.
5. Bhakta MS, Segal DJ: The generation of zinc finger proteins by modular assembly.
Methods Mol. Biol. 2010 649:3-30.
6. Sander JD, Maeder ML, Reyon D, Voytas DF, Joung JK, Dobbs D: ZiFiT (Zinc Finger
Targeter): an updated zinc finger engineering tool. Nucleic Acids Res. 2010 (Web
Server issue):W462-8.
7. Schymkowitz, J: The FoldX web server: an online force field. Nucleic Acids Res,
2005. 33(Web Server issue): p. W382-8.
8. Alibés A, Nadra AD, De Masi F, Bulyk ML, Serrano L, Stricher F: Using protein design
algorithms to understand the molecular basis of disease caused by protein-DNA
interactions: the Pax6 example. Nucleic Acids Res. 2010 21:7422-31.
76
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Simulation of pesticide effect on thermo-dependent arthropod populations: fixed
point iteration method
Carlos A. Bartó 1, Julio D. Edelstein 1,2, Eduardo V. Trumper 1,2
1
Informatics Dept., Exact, Physics and Natural Sciences, National University of
Cordoba, Argentina
2
Entomology, Agricultural Experimental Station (INTA) Manfredi, Cordoba, Argentina
Agricultural crops are often damaged by arthropods and triggering the application of
pesticides for vegetal protection, with the aim of reducing the pest populations. Pest
resurgence can occur, inducing frequent pesticide applications with the consequent
environmental risks of pest resistance, killing of non-target population, air, water and
soil pollution, epidemiological consequences and increasing costs of crop production,
among other effects. Few pesticide models oriented to the pest population effect can be
found in specialized papers. In the present work the effect of pesticides on the number
of individuals (for example larvae, state variable) for a population is defined.
The simulated organisms develop as a function of environmental temperature, assuming
normal distribution of developmental rates and instantaneous response of the
metabolism to environment. The extended von Foerster (eVF) equation is used to solve
numerically, partial differential equations of change in population abundance on time
and physiological age. The software ARTROPOB ® (2012), designed to implement
simulation models of stage structured population dynamics, was used and the pesticide
module was included in its system. The evolution of multiple stage larvae with the
application of a pesticide is made by the iteration for the fixed point iteration process.
The state variable is calculated by the integration of eVF fluxes. The state variable is
reduced proportionally by a survival coefficient. The convergence procedure was
controlled by tolerance parameters, limiting the number of iterations and the distance
from the estimation (Fig. 1). A norm distance per generation and larval instar was
calculated as the absolute maximum value of the mortality in a determined time minus
the mortality a discrete time before. More than a simple chemical pesticide effect, other
kinds of control tactics like biological ones, based on a denso-dependent and frequencydependent pathogens, predators with a satiation function or parasitoids with a learning
behaviour, were also able to be simulated.
400
4000
n-previous
n-present
350
3500
Norm
300
3000
250
2500
200
2000
150
1500
100
1000
50
500
0
Total mortality
Instantaneous distance
Integral Output
0
0
1
2
3
4
5
6
7
8
9
Iteration
Figure 1: Iterative calculus of the integrated mortality and the norms in the fixed point method applied for
pesticide effect estimation on population models.
Although a real pesticide effect cannot be predicted on the bases of real circumstances,
this theoretical tool allows tactics application analysis. Modeling the pest management
process allows researchers to estimate the effects on virtual pest populations and
evaluate optimal timing of pesticide application.
77
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Strategies for gap-closure of Thermus sp. 2.9 genome.
Laura Navas1 , Ariel Amadío2, Rubén Zandomeni3
1,3
Instituto de Microbiología y Zoología Agrícola (IMyZA), Instituto Nacional de
Tecnología Agropecuaria (INTA), Las Cabañas y de Los Reseros, Buenos Aires,
Argentina
2
CONICET – EEA Rafaela, Instituto Nacional de Tecnología Agropecuaria (INTA)
Extremophile organisms are of great interest due to their potencial as sources of proteins
for biotechnological application. A thermophilic bacterium was isolated from a hot
water spring in Salta, Argentina. Phylogenetic analysis indicated that it belongs to the
Thermus genus. DNA sequencing was performed using Roche 454 technology to obtain
the complete genome sequence. Two hundred and fifteen thousand non-paired readings
were obtained totaling 81.238.046 pb and providing approximately 35-40 fold coverage
of the genome size (estimated in 2Mpb). Reads were assembled de novo using Newbler
(v2.3), which generated 137 contigs larger than 500 nucleotides and a N50 of 39.906
pb. The G+C genome content resulted in 66.7%.
Different bioinformatics strategies were used to predict the collinearity between contigs
to finish the genome. First, synteny with two species of the Thermus genus were
analyzed and compared to contigs from the isolate 2.9. A second strategy consisted in
the generation of an optical map from Thermus sp 2.9 genome (OpGen, Sanger
Institute) using the restriction enzyme NheI. It allowed comparing the restriction
patterns of the whole genome with those of each contig generated in silico. Finally, a
fosmid library (Epicentre) was generated with an insert size of 30-40Kb, and the ends of
150 clones were sequenced. All this approaches allowed the generation of scaffolds to
order the contigs.
As the result of these strategies 95 joins were predicted. Thirty two of them were
confirmed by PCR and sequencing of amplified products. The average size of the
sequenced gaps was ~1200 bp. Currently, we have 10 scaffolds which cover 98% of the
genome.
Following this strategy we were able to join several contigs, and order many of them.
However, it is clear that obtaining one scaffold (or ideally one contig) is particularly
complex for genomes with high GC content. To increase the information and get a
finished genome, we are currently planning a mate-paired run with an insert size of
~8kb, aiming not only to join the 10 scaffolds, but also solve repetitive sequences of
remaining contigs.
Key words: Genome finishing, Gap closure, Scaffolding, Thermophilic bacteria
Acknowledgments
We thank Matthew Dunn and all Team 63 from Wellcome Trust Sanger Institute for the
generation of the optical map.
78
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
COMPUTATIONAL PREDICTION OF THE BIOLOGICAL EFFECTS OF MUTATIONS IN OTC GENE
IN ARGENTINIAN PATIENTS
1
2
1
1
Silene Silvera Ruiz , Antonio Arranz Amo , Laura Laróvere , Raquel Dodelson de Kremer .
1
Centro de Estudio de las Metabolopatías Congénitas, Hospital de Niños de Córdoba, Fac. De Cs. Médicas,
UNC, Córdoba, Argentina
2
Unitat de Metabolopaties, Hospital Universitari Materno-Infantil Vall d´Hebron, 08035 Barcelona, España
Summary
Ornithine transcarbamylase deficiency (OTCD) is the most common inherited disorder of the urea cycle and is
transmitted as an X-linked trait. Defects in the OTC gene cause a block in ureagenesis. Males with mutations
leading to complete OTCD develop hyperammonemic coma in the first week of life, which carries an
approximately 50% mortality and universal morbidity among survivors [1–3]. In males with mutations resulting in
partial OTC deficiency and in approximately 15% of female heterozygotes, hyperammonemic crisis occurs later in
childhood and carries a 10% mortality and significant morbidity [4]. OTCD results from mutations in the OTC
gene, encoding a 354-residue polypeptide. The complete repertoire of OTCD-causing mutations is estimated as
560 mutations, including 290 mSNCs. Since disease-causing mSNCs represent <20% of the 2064 possible OTC
mSNCs, simple approaches are essential for discrimination between causative and trivial mSNCs [5]. Observation
of the OTC structure appears a simple approach for such discrimination, comparing favourably in our simple with
four formalized structure-based and/or sequence-based in silico assessment methods, and supporting the
causation of deficiency by the given mutations. The aim of this work was to validate five mSNCs c.386G>A,
c.452T>G, c.533C>T, c.622G>A, c.829C>T found in Argentinian patients of our centre, and correlate the
pathogenic degree of each one with their fenotype/clinical data. The five patients were diagnosed biochemically
and molecularly, this serie includes affected males and simptomatic female carriers with mild and severe forms.
Thus, multiple sequence alignment was made by CLUSTALW2 (http://www.ebi.ac.uk/Tools/msa/clustalw2/). The
OTC mSNCs were evaluated using bioinformatics tools of public databases and web-based software programs.
The conservation score of the affected residue was calculated by two tools: PolyPhen,
(http://genetics.bwh.harvard.edu/pph) on which scores are evaluated as 0.000 (most probably benign) to 0.999
(most probably damaging); and SIFT (http://blocks.fhcrc.org/sift/SIFT.html) on which the scores less than 0.05
indicate substitutions are predicted as intolerant. Another tool that we used was PoPMusic
(http://babylone.ulb.ac.be/popmusic/) wich evaluates the changes in stability of a given protein under single-site
mutations, on the basis of the protein's structure (Table 1).
Table 1
79
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
In sílico prediction of cross-reactive epitopes of the major soybean allergen Gly m Bd 30K
with bovine caseins and their analysis by immunochemical methods.
Candreva, Ángela1,2, Parisi, Gustavo3, Docena Guillermo2 and Petruccelli Silvana1.
1 CIDCA UNLP, La Plata 47 y 116.
2 LISIN, FCE, UNLP, La Plata, 47 y 115.
3 Departamento de Ciencia y Tecnología, UNQ, Roque Saenz Pena 182, Bernal.
Background
Cow’s milk allergy (CMA) constitutes the main food allergy in Argentina. The nutritional substitutes
mostly used are soy-based formulas; however, 40% of the patients do not tolerate soybean milk.
The molecular bases of these reactions are not fully understood. Our group has shown that the
major proteins of soybean 11S and 7S storage proteins, shared cross-reactive epitopes with bovine
caseins. The aim of this work was to predict potential cross reactive epitopes in the major soy
allergen P34. Although P34 is a minor seed component is considered a major soybean allergen.
Servers currently available on the internet are not able to predict cross-reactive proteins between
soybean proteins and cow’s milk proteins. Since the relevance of cross reactive epitopes between
soybean and CMP has been confirmed by our group by in vitro immunochemical studies and in vivo
using a mouse model for CMA, the performance of the in sílico prediction method needs to be
improve it. Our objective was to develop a computational strategy to predict P34 and bovine
caseins common epitopes and then compare the in sílico results with immunochemical analysis.
Material and Methods
For analysis of P34: bovine casein allergenic epitopes were obtained from the database IEDB , and
then were aligned with P34 protein, the obtained results were plot in graphic built based on the
consensus amino acid accumulation.
P34 homology modeling, solvent accessibility and discontinuous epitopes: To build 3D models we
used homology modeling with the sequence of the protein P34 (gi 195957142) as target. With this
sequence we searched Protein Data Bank (PDB) to obtain putative templates. Using this model we
obtained the positions exposed to solvent using the program DSSP. In addition, P34 modeled
structure was used to analyzed if the cross reactive epitopes identified by the sequential analysis
were predictive as B-cell epitopes by the Discotope server.
Immunochemical analysis using Overlapping Synthetic Peptides: to test our prediction the entire
protein sequence of P34 was synthesized as linear 15-mer overlapping peptides with five-residue
shifts immobilized on paper. The recognition of the synthetic peptides by different primary
antibodies: pool of IgE allergic patient sera reactive and mouse monoclonal antibodies (mAbs):
specific of α, ß and κ-casein were assayed.
Results
The sequential analyzes detect two main regions with potential cross reactive epitopes: region A
and B. Six 15 mer peptides in region A with the highest score were recognized by the two different
pools of IgE patient sera, and the 3 casein specific mAbs. Only one region B peptides was
recognized by the 3 mAbs. The predicted peptides were on the surface of the molecule exposed to
the solvent.
Conclusion
In conclusion, the in sílico methods used in this work allow as predicting cross-reactive epitopes
between P34 and bovine caseins that were confirmed by experimental analysis.
80
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Coevolution and Contact Networks within Superfolds
Martin Banchero1, Elin Teppa1 and Cristina Marino Buslje1
1Fundación Instituto Leloir
Changes due to mutations of amino acids do not occur randomly but functionality and structure impose constrains to
different positions. There are compensatory mutations such as a mutation in a certain position induces a coordinated
mutation in another(s) position(s) elsewhere in the protein. These coevolving mutations are of key interest as they
identify residues that interact within the protein, engaged to a particular function as examples: catalytic reaction,
structure stabilization, protein-protein and substrate interaction and allosteric regulation.
It has long been suggested that correlated mutations can be exploited to infer spatial contacts within the tertiary protein
structure [1]. On the other hand, folds are not equally adopted by proteins but instead 40 % of the proteins in the PDB
adopt 0.1% of the possible folds. Those highly populated group of folds are called Superfolds and adopt very regular
architectures (e.g., TIM barrel fold, αβ-barrel, Rossmann fold; three-layer αβ-sandwich; αβ-plait, two-layer αβsandwich) [2].
In this work we first analyzed the relationship between Mutual Information (MI), interpreted as a measure of
coevolution [3], and contact distance at different families within a superfamily (defined as Pfam clan [4]), belonging to
a superfold. Secondly we try to uncover which MI relationships are common between families of the same clan due to
the common fold and superfamily function, an also we try to identify which ones are specific to each family (see
figure 1).
With this approach we aim at identifying superfamily MI (i.e those MI relationships due to common superfamily
function and or fold) and family specific MI. Understanding the similarities and differences between families of the
same kind could be the foundation of more precise annotation methods.
Figure1: MI and distance map of three families of the same clan (Tim barrel fold CL0160: PF01280, PF01717 and PF8267) and a
family of a different clan and fold (globin-like fold CL0090: PF00042) as comparison. Blue dots: top 20 % MI; green dots:
distance <8Å; red dots: top 20% MI and distance <8Å.
References:
1. Morcos F, Pagnani A, Lunt B, Bertolino A, Marks DS, et al. (2011) Direct-coupling analysis of residue coevolution captures
native contacts across many protein families. Proceedings of the National Academy of Sciences 108: E1293-E1301.
2. Orengo Ca Fau - Thornton JM, JM T (2005) - Protein families and their evolution-a structural perspective. Annu Rev Biochem
74: 867-900.
3. Buslje CM, Santos J, Delfino JM, Nielsen M (2009) Correction for phylogeny, small number of observations and data
redundancy improves the identification of coevolving amino acid pairs using mutual information. Bioinformatics 25: 1125-1131.
4. Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, et al. (2012) The Pfam protein families database. Nucleic Acids Research
40: D290-D301.
81
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Evolutionary and structural analysis of procirsin, a typical plant aspartic proteinase zymogen
Daniela Lufrano1, Sandra E. Vairo Cavalli1, Gustavo Parisi2
1
Laboratorio de Investigación de Proteínas Vegetales (LIPROVE), Departamento de Ciencias
Biológicas, Facultad de Ciencias Exactas, Universidad Nacional de La Plata, C.C. 711, 1900 La
Plata, Argentina.
2
Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Roque Sáenz Peña
352, Bernal, Buenos Aires, B1876BXD, Argentina
E-mail: [email protected]
In plants, aspartic proteases (APs, EC. 3.4.23) appear to be the second-largest class of
proteases being A1 family the best studied and the largest group, classified in these organisms
into typical, nucellin-like and atypical proteases [1]. Typical plant APs are synthesized as singlechain preproenzymes characterized by the presence of the plant specific insert (PSI) domain of
approximately 100 amino acids, absent in APs from other sources (viruses, bacteria, yeast,
fungi and animals). The preproenzymes are subsequently processed into single- or two-chain
mature forms where PSI domain is removed. The prosegment and the first residues of the Nterminal portion of the AP precursors have been described to play a critical role in blocking
catalytic aspartates and thus preventing autoactivation. Particularly, residue Arg 7 of the
propeptide in barley´s typical AP precursor (prophytepsin) is reported to form an ionic
interaction with Glu 171 and Asp 178 in mature protein, and together with other hydrophobic
and hydrogen bonds, links the propeptide in a way that the Lys 11 and Tyr 13 of the N-terminal
region interact with the active site inhibiting the activity of propythepsin. However, the
precursor of a typical AP from flowers of Cirsium vulgare (Savi) Ten. (Asteraceae), called
procirsin and obtained by heterologus expression, was shown to be active at acidic pH [2].
In order to find possible differences that explain recombinant procirsin activity, we performed
a phylogenetic analysis of procirsin and a structural model using Modeller program, further
evaluated with the DOPE potential and Prosa II server (score -8.5). We also estimated the
variation of the net charge of the propeptides of procirsin and prophythepsin as a pH function.
Our analysis shows that procirsin shares a cluster with APs from diverse organisms in which the
closest homologous is cyprosin from C. cardunculus (98% of sequence similarity). According to
the structural model and the evolutionary analysis, all the residues described as important for
biological function, as well as Arg 7p, Lys 11 and Tyr 13 are conserved in procirsin in
comparison with all the sequences of the cluster. The large positive charge at acidic pH
predicted for the prosegment of procirsin when compared with prophytepsin, could alter the
correct localization of the propeptide avoiding the interaction of Lys/Tyr with the catalytic
residues and turning the procirsin active at low pH. We propose that as pH increases, the
charge in the prosegment decreases allowing the correct conformation to inhibit the
proenzyme.
Acknowledgements
We would like to acknowledge the financial support to ANPCyT, Argentina (PICT 02224), University of
La Plata (Project X-576). G. Parisi and S.E. Vairo Cavalli are members of CONICET Research Career
Program; D. Lufrano is awarded fellowship of CONICET.
References
1. Faro C, Gal S: Aspartic proteinase content of the Arabidopsis genome. Current protein &
peptide science 2005, 6:493-500.
2. Lufrano D, Faro R, Castanheira P, et al: Molecular cloning and characterization of procirsin,
an active aspartic protease precursor from Cirsium vulgare (Asteraceae). Phytochemistry
2012, in press.
82
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
BiFe: a national EMBNet node hosting Argentine bioinformatics applications
EMBNet node Argentina1
1
Protein Physiology Laboratory, Departamento de Química Biológica, Facultad de Ciencias
Exactas y Naturales, Universidad de Buenos Aires (Argentina).
Background
EMBNet [http://www.embnet.org] is a science-based group of collaborating nodes that
provides bioinformatics services to the molecular biology community [1]. BiFe
(Bioinformática Federal [http://www.embnet.qb.fcen.uba.ar]) is the Argentine node of
EMBNet. Our goal is to help the Argentine bioinformatics community [2] in bringing their
newly developed applications online. BiFe is located at the Protein Physiology
Lab, Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales,
Universidad de Buenos Aires, Argentina. The EMBNet node Manager is Dr. Ignacio E.
Sánchez, and the EMBNet staff includes Dr. Adrián G. Turjanski and Msc. Leandro G.
Radusky (Departamento de Química Inorgánica, Facultad de Ciencias Exactas y Naturales,
Universidad de Buenos Aires, Argentina).
Materials and methods
We have built our application servers using an open source general-purpose toolkit, which is
publicly available for downloading, together with a tutorial here. Its features include a fully
customizable input form including file uploading and automatic ftp file retrieval; dynamic
application loading and a fully customizable output form including text, tables, graphics and
protein structure representation using Jmol [3]. Argentine academic researchers interested
in having their bioinformatics applications hosted by BiFe may contact us for further
details. We will put our resources at your service, free of charge. We do not expect any
retribution in terms of authorship of any other form of scientific credit.
Results and conclusions
The site groups bioinformatics applications developed in Argentina. Some of them are
hosted by BiFe, such as the Frustatometer [4], an application to localize frustration in
proteins and BeEP, a tool to validate protein models through evolutionary information. The
site also displays links to applications and databases that have also been developed in
Argentina and are hosted elsewhere.
Acknowledgements
We acknowledge funding from the Ministerio Argentino de Ciencia, Tecnología e Innovación
Productiva. Ignacio E. Sánchez and Adrián G. Turjanski are researchers from Consejo
Nacional de Investigaciones Científicas y Técnicas.
References
1.
D'Elia D, Gisel A, Eriksson NE, Kossida S, Mattila K, Klucar L, Bongcam-Rudloff E:
The 20th anniversary of EMBnet: 20 years of bioinformatics for the Life
Sciences community. BMC Bioinformatics 2009, 10 Suppl 6:S1.
2.
Bassi S, Gonzalez V, Parisi G: Computational biology in Argentina. PLoS Comput
Biol 2007, 3(12):e257.
3.
Herraez A: Biomolecules in the computer: Jmol to the rescue. Biochem Mol Biol
Educ 2006, 34(4):255-261.
4.
Jenik M, Parra RG, Radusky LG, Turjanski A, Wolynes PG, Ferreiro DU: Protein
frustratometer: a tool to localize energetic frustration in protein molecules.
Nucleic acids research 2012, 40(W1):W348-W351.
83
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Design of a pipeline for de novo identification of cis-regulatory
elements involved in transcriptional re-programming during tomato
fruit development and ripening
Tomas Duffy1, Fernando Carrari1
1
Instituto Nacional de Tecnología Agropecuaria, Hurlingham, Argentina.
Background. During the development and ripening of the tomato (Solanum
lycopersicum) fruits extensive reprogramming of the gene transcriptional
network occurs via the interaction of transcription factors with cis-regulatory
elements (CREs). One of the mechanisms explaining the coordinated
expression of genes involved in a common biological process is the CRE
composition of promoters of the involved genes. In this direction,
computational methods that identify over-represented DNA motifs in the
promoters of co-expressed genes are time- and cost-effective complements for
large-scale putative cis-regulatory elements discovery (pCREs).
In this work we applied a combination of softwares and in-house scripts to
analyze microarray experiments available on public databases to produce
clusters of co-expressed genes in which to search for over-represented pCREs.
Similarity with previously described CREs, positional preference and cooccurrence were analyzed.
Materials and Methods. Fifty-four two-color TOM1 tomato microarray
experiments, of 9 time points during tomato fruit development and ripening,
where downloaded from the Tomato Functional Genomics Database,
normalized with the Limma R package, and probe summerization was carried
out with the WGCNA R package. Clusters of co-expressed genes where
generated by using *omeSOM software, based on self organizing maps.
Promoters (1500bp up-stream of translation start sites) of co-expressed genes
where fetched using in-house Perl/BioPerl scripts. On these sequences pCREs
were searched using three different tools namely: MEME, Weeder and
MotifSampler. All statistically over-represented pCREs where clustered to
eliminate redundant motifs using Gimmemotifs software. Non-redundant pCREs
(nr-pCREs) where compared to previously described CREs present in the PLACE
database using STAMP. In order to evaluate positional preference, all nr-pCREs
were mapped using FIMO on all analyzed promoters. The statistical significance
of motif co-occurrence was calculated by the cumulative hypergeometric
distribution function using a combination of R and Perl scripts. A network of cooccurring pCREs was built using Cytoscape.
Results and Discusion. We identified de novo 410 nr-pCRES, which showed
strong positional preference for the first 400bp up-stream of the translation
start site. Two hundred and fifty three of them showed high similarity to
previously described plant CREs. We generated a network of statistically co-
84
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Diversity and evolution of retinoblastoma protein-binding LxCxE motifs in human
proteins
Lucía B. Chemes1, Juliana Glavina2, Gonzalo de Prat-Gay1 and Ignacio E. Sánchez2
1
Protein Structure-Function and Engineering Laboratory. Fundación Instituto Leloir and
IIBBA-CONICET.
2
Protein Physiology Laboratory, Departamento de Química Biológica, Facultad de Ciencias
Exactas y Naturales, Universidad de Buenos Aires.
Introduction
The retinoblastoma tumor suppressor protein (Rb) plays a central role in eukaryotic cell
cycle control, differentiation and chromatin structure regulation. Rb is the hub of a large
protein interaction network. The retinoblastoma-binding LxCxE linear motif mediates a high
affinity interaction between a conserved surface patch in Rb and one third (approximately
30) of human cellular Rb targets [1] and also between Rb and several oncoproteins from
human viruses. In the present work we study the occurrence and evolution of LxCxE motifs
present in human proteins using bioinformatics tools to identify this linear motif and to
analyze its variability in homologous non-human proteins.
Methods
We use available linear motif databases and bibliographic search to compile a database of
human Rb target proteins harboring the LxCxE motif. We annotate the structural context of
the motif using the Protein Data Bank structure database [2] and the IUPRED predictor for
intrinsic disorder [3]. We also characterize the sequence context of the motif using sequence
logos [4] and searching for known associated motis [5]. For a subset of targets, we search
for the LxCxE motif in homologous proteins from eukaryotic and prokaryotic organisms and
analyze evolution of the motif.
Results and conclusions
We report that the LxCxE motif from human Rb protein targets can be found both within
disordered and within globular domains. When present in a globular domain, the motif can
occur in various secondary structure elements, suggesting that conformational transitions
must take place to allow for Rb binding. We find variability in the linear motifs associated to
the LxCxE motif and different conservation patterns when compared to the known instances
of viral proteins. We discuss the results found for LxCxE motifs in the human proteome in
the light of the information available on the Rb-LxCxE interaction and on the known features
of viral LxCxE motifs. Based on these data, we suggest that host and viral LxCxE motifs may
differ in their evolution and functional properties.
References
[1] Dick FA. Cell Div. 2007 Sep 13;2:26., [2] http://www.rcsb.org/pdb/home/home.do, [3]
Dosztányi Z, Csizmók V, Tompa P and Simon I. Bioinformatics (2005) 21, 3433-3434., [4]
Schneider TD, Stephens RM. 1990. Nucleic Acids Res. 18:6097-6100, [5] Chemes LB,
Glavina J, Faivovich J, de Prat-Gay G, Sánchez IE. J Mol Biol. 2012. In press. DOI:
http://dx.doi.org/10.1016/j.jmb.2012.05.036.
Acknowledgements
We acknowledge funding from Agencia Nacional de Promoción Científica y Tecnológica
(PICT 2010-1052 to I.E.S), Consejo Nacional de Investigaciones Científicas y Técnicas
(postdoctoral fellowship to L.B.C; G.d.P.G., and I.E.S. are CONICET career investigators)
and Instituto Nacional del Cáncer (graduate fellowship to J.G.).
85
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Identification of putative subtelomeric regions in the genome of Toxoplasma gondii
Santiago J. Carmona1, Maria C. Dalmasso2, Sergio O. Angel2,, Fernán Agüero1,
1
Laboratorio de Genómica y Bioinformática, 2Laboratorio de Parasitología Molecular
IIB-INTECH, Universidad de San Martín-CONICET, Buenos Aires, Argentina
Background
Most eukaryotic chromosome ends are formed by telomeric repeats and subtelomeric regions, also called Telomeric
Associated Sequences (TAS). TAS are patchworks of genes interspersed with repeated elements, and while these
domains present similar arrangements in different species, their sequences are highly divergent. In addition, these
regions present a particular nucleosomal composition and bind specific factors therefore producing a special kind of
heterochromatin. In the currently available draft of the T. gondii genome, telomeres are not completely assembled, and
chromosome ends have not been analyzed yet. Here we discuss some findings regarding T. gondii chromosome ends.
Results
All-vs-all pairwise sequence comparison of T. gondii chromosomes revealed the presence of a conserved region of
approximately 25 to 30 Kb at the ends of 9 of the 14 chromosomes in the parasite strain ME49, defined here as
TgTAS-like. Sequence similarity among these regions is on average ~70%, they are highly conserved in other strains,
but are unique to Toxoplasma, with no detectable similarity in other Apicomplexan parasites. The internal structure of
these TgTAS-like sequences consist of 3 repetitive regions separated by high-complexity sequences that are depleted
of genes, with the exception of one gene at their 3'
end. To analyze potential compositional bias along
the chromosome we performed a correspondence
analysis (CA) of the trinucleotide composition
observed in sliding windows of lengths 1 to 100 Kb.
The analysis showed a strong bias, with only 2
dimensions (the first and second principal
coordinates of the CA) largely explaining the
trinucleotide bias (>60%). TgTAS-like regions
showed the highest trinucleotide compositional bias
on the first principal coordinate (1PC) when using a
window size of 30Kb . This compositional bias is
similar to that observed in other genomic fragments
such as those containing centromeric sequences
(Figure 1). We also found that 1PC is negatively
correlated to gene density in the genome (Pearson's
correlation coef. -0.445, p-value < 10^-16), ie
genomic fragments with low 1PC values are generich while high 1PC is associated to gene-depleted
regions, such as TgTAS-like and centromeres.
Finally, ChIP-qPCR experiments showed that
nucleosomes associated to TgTAS-like sequences
are enriched in silencing epigenetic markers such as
histone H4 monomethylated at K20 and the histone
variant H2AX.
Conclusions
We identified a region encompassing ~ 30 Kb present in most of the Toxoplasma chromosomes, denominated TgTASlike. They form a specialized heterochromatin, characterized by: i) a particular trinucleotide composition, ii) a special
arrangement containing three satellite families, iii) depletion of coding sequences, and iv) enrichment in nucleosomes
containing heterochromatin-like histone markers. Interestingly, these features allowed us to identify similar regions, not
necessarily sub-telomeric, that might be functionally similar.
86
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Prediction of blood to liver coefficients for volatile organic compounds: a cheminformatics approach
Damián Palomba1,2, María Jimena Martinez1, Ignacio Ponzoni1,2, Mónica Díaz1,2, Gustavo Vazquez1, Axel J.
Soto3
1
Laboratory for Research and Development in Scientific Computing (LIDeCC), DCIC, UNS, Av. Alem 1250,
Bahía Blanca, Argentina
2
Planta Piloto de Ingeniería Química (PLAPIQUI) CONICET-UNS, La Carrindanga km.7, Bahía Blanca, Argentina
3
Faculty of Computer Science, Dalhousie University, Halifax, Canada
Background
Volatile organic compounds (VOCs) are organic chemical compounds whose composition makes it
possible for them to evaporate under normal indoor atmospheric conditions of temperature and
pressure. VOCs are of concern as both indoor and outdoor air pollutants because many of them are
known or suspected to cause chronic adverse health effects in exposed population. In this sense,
partition coefficients from blood to tissue are of importance in environmental, toxicological and
pharmacokinetic modeling. Although some prediction models were developed in the past [1], their
prediction capacity still remains to be improved. In this work we propose a new prediction model based
on a QSPR (Quantitative Structure-Activity/Property Relationship) modeling technique.
Materials and methods
The data set is composed of 122 volatile organic compounds; 438 descriptors were calculated using
Dragon software. The compounds and their respective blood-liver partition coefficients (logPLiver) were
extracted from reference [1]. We employed the interface and routines provided by the machine
learning tool Weka.
To generate the model, we divided the data set into a training set and an external validation test set
with 83 % and 17 % of compounds respectively. In order to select the most relevant descriptors we
employed a 5-cross-fold validation with in-fold feature selection (M5P implementation) over the
training set. Table 1 shows the performance metrics for each fold. Since a different set of attributes
may be selected in each instance of the cross-fold, a consensus scheme was employed. As a result, the
following relevant descriptors were selected: SIC2, S1K, X5A, SIC1, AAC, X4Av, H-046, nN, MSD, X2sol.
Table 1: R2 and RMSE values for each fold of the feature selection process
Fold 1
Fold 2
Fold 3
Fold 4
Fold 5
R2
0.821
0.74
0.71
0.70
0.74
RMSE
0.182
0.212
0.241
0.268
0.214
Results and conclusions
The final model was developed using a decision tree (WREP implementation) and validated using the
external test set. We obtained a model that uses 5 descriptors: H-046, S1K, X2sol, MSD y SIC1, with
performance metrics of R2 = 0.83 and RMSE = 0.18. Compared to the results reported in [1] we observe
a significant improvement of the prediction performance.
Acknowledgements
This work is kindly supported by grants PGI 24/ZN15 and PGI 24/ZN16 (Universidad Nacional del Sur)
and PIP112-2009-0100322 (CONICET - National Research Council of Argentina).
References
1. Abraham M H, Ibrahim A, Acree W E Jr: Air to liver partition coefficients for volatile organic
compounds and blood to liver partition coefficients for volatile organic compounds and drugs. Eur J
Med Chem 2007, 42: 743-751.
87
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
How much information keeps the solvation structure of a
Crystal Protein?
Carlos Modenutti 1,2, Diego F. Gauto1, Leandro Radusky1 Silvia Hajos2 y Marcelo A. Marti1
1
Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires,
Ciudad Universitaria, Pabellón II, C1428EHA Ciudad Autónoma de Buenos Aires, Argentina.
2
Departamento de Microbiología, Inmunología y Biotecnología Facultad de Farmacia y Bioquímica. Universidad
de Buenos Aires, Junin 954, C1113AAD Ciudad Autónoma de Buenos Aires, Argentina.
Background
Interactions between carbohydrates and proteins mediate numerous important biological functions,
such as signal transduction, cell adhesion, host−pathogen recognition, and the immune response (1).
In our previus works (1,2) combining MD simulations with statistical analysis, we showed that the
properties of the water molecules close to the surface of the carbohydrate recognition domains (CRD)
of various lectins, resemble the structure of the lectin-carbohydrate complex. Specifically, we defined
the so called water sites (WS) as space regions close to the protein surface with higher than
bulk solvent water finding probability, and computed several thermodynamic and structural
properties. Saraboji K. et al found a correlation between the position of the WS and crystallographic
waters suggesting that the CRD in Galectin-3 is preorganized to recognize a sugarlike framework of
oxygens (3). In order to check whether this is an exclusive property of lectins or a pattern is common to
proteins capable of binding carbohydrate we create a database (DB) from a set of proteins obtained
from the Protein Data Bank.
Results
We analyze crystallographic structures of apo-protein (AP) vs protein-carbohydrate complex (PCC) in
order to check the ability of crystallographic water to predict the position of OH groups. We found a
direct correlation between the position of crystallographic water molecule in AP and OH group of the
ligand in the complex structure.
Conclusion
This study shows that the water molecule position obtained from crystallographic structure have a
strong correlation with the OH group position in the complex and appear as a powerful tool for
glicomimetic drug design.
Reference
1.Gauto, D. F., Di Lella, S., Guardia, C. M., Estrin, D. A. & Marti, M. A. J Phys Chem B. 2009 Jun
25;113(25):871724.
2.Di Lella, S., Marti, M. A., Alvarez, R. M. S., Estrin, D. A. & Díaz Ricci, J. C. J Phys Chem B. 2007
Jun 28;111(25):73606. Epub 2007 May 25.
3.Saraboji K.,Håkansson M.,Genheden, Diehl C.,Qvist J.,Weininger U., Nilsson U, Leffler H,Ryde
U,Akke M, and Logan D.Biochemistry.2012 Jan 10:51(1):296-306.Epub 2011 Dec 7.
88
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Online modeling of Endoglucanases from Aspergillus genus using PHYRE2
Manuel Cossio1, Gastón Sioli1, Griselda Perona1, Lorena Castrillo1 y Pedro Zapata1
INBIOMIS, Posadas, Misiones, Argentina, 3300
Background
Cellulolytic enzymes are generally induced as multienzyme systems and they have been divided,
according to cellulose fiber cleaving region, into three classes endoglucanases, cellobiohydrolases
and β-glucosidases. Endoglucanases are extracellular enzymes that degrade cellulose to lower
molecular weight sugars and can be found on wood decomposing fungi [1]. Researchers are now
focusing their view in this field due to the potential application of these enzymes in bioethanol
production. Protein models provide us useful information about structure, global interactions with
membrane complexes and other proteins, number and location of hydrophobic and hydrophilic
residues and molecular weight, among other things. All previously mentioned would allow us to
make inferences about kinetic and enzymatic properties of endoglucanases having just their
aminoacidic sequence as real data.
Materials and Methods
To construct 3D model of endoglucanases, aminoacidic sequences of three species belonging to
the genus Aspergillus were obtained from NCBI database. They were processed with “protein
homology/analogy recognition engine v 2.0” (PHYRE 2) online software obtaining specific models
for each protein sequence [2]. These models were analyzed and compared in order to identify
structural differences between each enzyme.
Results
Specie
Total residues
Residues modeled*
% α helix
% β strand
% disordered
* >90% confidence
A. fumigatus
373
61 % residues
20.21
32.95
46.84
A. niger
356
71 % residues
7.12
39.15
53.73
A. oryzae
333
92 % residues
38.83
16.96
44.21
Conclusions
3D modeling with PHYRE 2 represents a good strategy that allows us to infer about structure and
protein composition from FASTA aminoacidic sequences.
References
1. Knowles: Cellulase families and their genes. Tibtech 1987, 5: 255-261.
2. Kelley, Sternberg: Protein structure prediction on the web: a case study using the Phyre
server. Nature Protocols 2009, 4: 363-371.
89
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
INTA bioinformatic platform: An approach using ontology driven database and web interface to
integrate and explore genomic data.
Sergio Gonzalez1,3*, Bernardo Clavijo1*, Máximo Rivarola1,2, Paula Fernandez1,2, Marisa Farber1,2 and
Norma B Paniego1,2
1
Instituto Nacional de Tecnología Agropecuaria/Instituto de Biotecnología, Hurlingham, Argentina,
2
CONICET, Argentina.
3
Facultad de Ingeniería/UBA, Buenos Aires, Argentina.
*Contributed equally
Background
During the last few years, as the availability, affordability and magnitude of genomics and genetics
research increases so does the need to provide accurate and reliable access to the resulting data and
combined analyses of genomes. One approach is to combine the outputs from different software tools
and merge the results so as to check the reliability of the merged-output after visual analysis. Today,
more than 1,000 genomes have been completely sequenced, moreover, high-throughput sequencing
(Next-Generation-Sequencing: NGS) technologies underscore the importance of computational
methods in annotating and mining the vast amount of genomic data. In summary, no off-the-shelf
solution exists for the assembly, gene prediction, genome annotation and merged-data presentation
necessary to interpret and/or fully take advantage of all genomic features. The huge effort to invest
large resources into custom bioinformatics support for any genome sequencing project remains a major
challenge to fully understand an organism's genome.
Results
In this work, we present an approach using a ontology database to store, visualize, analyze and share
this information, also including information associated to each feature represented in the database. For
example SNPs associated to a gene feature or information from transcriptomic assays. To a accomplish
our approach, on one hand we first developed ATGC, that uses Chado (Generic Model Organism
Database, http://gmod.org), an ontology driven relational database schema implemented in
PostgreSQL. One of the main goals for ATGC is to facilitate the exploration and visualization of the
data. The main development effort was done to exploit GO annotation and analyzing the annotated
genes, allowing users to move through the GO-DAG structure. This approach navigates between
different classes of available genes on different projects. On the other hand we combined functional
annotation with other genomic analysis (such as transcriptomics) utilizing the genome browser
Gbrowse (gmod.org/wiki/GBrowse ) to facilitate the visualization of all data in its genomic context.
Conclusion
We developed a user friendly flexible platform to organize genomic data. The integration of functional
annotation, information associated to each feature and genomic coordinate based system enables easy
exploration to generate users hypothesis. Finally, we plan to optimize the collection management
features, allowing users to create and manipulate lists of features by different criteria, even connecting
the database to complementary platforms for data processing and analysis like Galaxy, providing a
mean to perform data manipulation and storage.
90
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Phylogeny of fungal species of genus Aspergillus using ITS sequences
Manuel Cossio1, Gastón Sioli1, Griselda Perona1, Pedro Zapata1
INBIOMIS, Miguel Lanús, Posadas, Misiones, Argentina, 3304
Background
A method to identify Aspergillus at the species level and differentiate it from other true
pathogenic and opportunistic molds was developed using the 18S and 28S rRNA genes for primer
binding sites[1]. The “Internal Transcribed Spacer” (ITS) region has been used widely in molecular
characterization because of their relatively high variability and facility of PCR amplification and its
sequence constitute a significant value for classification in fungi because of its appropriate
evolutionary rate.
Materials and methods
The ITS sequences of six species of the genus Aspergillus were obtained from NCBI nucleotide
database. They were aligned with CLUSTAL X v 2.0 and the edited with Bioedit v 7.1. The
secuences edited were processed with MEGA 5.1 to build the phylogenetic tree using maximum
likelihood method [2].
Results
Phylogenetic tree
ITS fungal species 1. A. versicolor 2. A.unguis 3. A. flavipes 4. A. fumigatus 5. A. oryzae 6. A. flavus
Conclusions
ITS sequences could be used in maximum likelihood phylogenetic trees construction.
References
1. Henry, Iwen : Identification of Aspergillus species using Internal Transcribed Spacer regions 1
and 2. J Clin Microbiol. 2000, 38:1510–1515.
2. Kishino H, Hasegawa M: Evaluation of the maximum likelihood estimate of the evolutionary
tree topologies from DNA sequence data, and the branching order in Hominoidea. Journal of
Molecular Evolution 1989, 29:170- 179.
91
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Comparison of two homology based protein structure online software.
María Perona1, María Molina1, Manuel Cossio1, Pedro Zapata1
INBIOMIS, Posadas, Misiones, Argentina, 3300
Background
Laccases are oxidative enzymes which have received special attention from researchers in last decades
due to their ability to oxidase both phenolic and nonphenolic lignin related compounds as well as highly
recalcitrant environmental pollutants. Those properties make them very useful for their application in
biotechnological processes such as detoxification of industrial effluents, mostly from the paper and pulp,
textile and petrochemical industries. Structure analysis constitute a very important step in the
comprehension of enzymatic kinetics and for that reason online softwares that are capable of building
3D structures are considered very useful tools in the understanding of enzymatic behavioral.
Materials and Methods
The aminoacidic sequence of Laccase of Trametes sanguinea was obtained from NCBI online database.
To generate the protein 3D structure, the sequence was run with protein homology/analogy recognition
engine v 2.0” (PHYRE 2) and SWISS-MODEL online software [1] [2]. The approaches for protein structure
generated by both programs were analyzed and compared each other in order to determine similarities
and differences between them.
Results
Software
Interactive 3D model
Residues modelled
Confidence
% α hélix/β strand
Quaternary structure
Ligand information
Phyre 2
YES
100%
>90%
YES
Not informed
Not informed
Swiss- Modeler
YES
100%
Not informed
Not informed
YES
YES
Conclusions
Both softwares were efficient producing 3D models of Laccase aminoacidic sequence.
References
1. Arnold, Bordoli: The SWISS-MODEL Workspace: A web-based environment for protein structure
homology modelling. Bioinformatics 2006, 22:195-201.
2. Kelley, Sternberg: Protein structure prediction on the web: a case study using the Phyre server.
Nature Protocols 2009, 4: 363-371.
92
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Construction of phylogenetic trees from Trichoderma sp using the program MEGA 5.10
Gastón Sioli, Lorena Castrillo, Manuel Cossio, Natalia Amerio, y Pedro Zapata
INBIOMIS, Posadas, Misiones, 3300, Argentina
Background
MEGA1 5.10 is used to get in creating phylogenetic trees from protein or nucleic acid sequence data, in order to
analyze
similarities
and
the
degree
of
approximation
between
the
sequences.
This work described, step by step the aligning of sequences, estimating the tree by test of maximum likelihood and
drawing the tree.
The analysis of these sequences allows a comparison between different genetic strains and is a tool to supplement
the molecular identification of isolates to species level.
Materials and Methods
Six sequences were obtained from different strains belonging to Trichoderma genus through NCBI database, the
alignment was performed with Clustal X 2.1 program. The aligned sequences were edited using BioEdit 7.1.3.0, and
using MEGA 5.10 program were obtained dendrograms based on Maximum Likelihood test. From the five
sequences, 1 corresponds to T. koningiopsis, 1 to T. pleuroticola, 1 to T. hamatum, 1 to T. harzianum y 1 to T.
brevicompactum.
Results
ITS fungal species 1. T. koningiopsis 2. T. pleuroticola 3. T. harzianum 4. T. hamatum 5. T. brevicompactum
Conclusions
Analysis by bioinformatics programs, such as MEGA 5.10, is useful for inferring phylogenetic relationships among
different strains of Trichoderma sp, demonstrating both the presence of homologies as well as the evolutionary
distances.
References
1 Tamura K, Peterson D, Peterson N, Stecher G, Nei M, and Kumar S: MEGA 5 Molecular Evolutionary Genetics
Analysis using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods. Molecular Biology
and Evolution 2011, 28: 2731-2739.
93
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
On line comparison of sequences alignment and phylogenetic analysis of native Trichoderma sp from Misiones
province
Gastón Sioli, Lorena Castrillo, Manuel Cossio, Natalia Amerio, María Isabel Fonseca y Pedro Zapata
INBIOMIS, Posadas, Misiones, 3300, Argentina
Background
Species of Trichoderma1 genus are of great biotechnological interest because they offer good qualities as biological
control agents, soil bioremedial, growth promoters and enzymes producers. The diverse biotechnological
applications of Trichoderma make an accurate strains identification essential, as well as their phylogenetic
relationships.
The molecular species classification it is done by the development of online tools that perform an analysis based on
barcode or nucleotide sequences, using bioinformatics software available on line. Which can be used to construct
evolutionary trees or dendrogramas, that reflect homologies and genetic relationships on the principle of minimum
evolution or maximum parsimony.
Materials and Methods
It were taken fifteen native Trichoderma sp isolates of Misiones province, and were characterized molecularly by
the internal transcribed spacer regions ITS1 and ITS2 amplification of ribosomal DNA. For its determination to
species level, bioinformatics analyzed by using three databases highly recognized: Fungal barcoding, TrichOKEY, y
NCBI.
To construct phylogenetic trees, the alignment of the sequences was performed using Clustal X 2.1 program. The
aligned sequences were edited using BioEdit 7.1.3.0, and with MEGA2 5.10 program, dendrograms were obtained
based on Maximum Likelihood test, Neighbor-Joining test, Minimum Evolution test, and Maximum Parsimony test.
Moreover, with T.N.T.3 program, dendrograms were produced by Bootstrap and Jacknife methods.
Results
According to bioinformatic analysis found that of 15 isolates of native Trichoderma sp studied, 6 correspond to T.
harzianum, 6 to T. koningiopsis, 1 to T. hamatum, 1 to T. brevicompactum y 1 to T. pleuroticola.
Also, it could establish similarity relations between strains by dendrograms constructed with MEGA 5.10 program
and TNT program.
Conclusions
Native Trichoderma sp strains show similar topology and minimal differences between parsimony analyses
performed.
References
1 Druzhinina IS, Kopchinskiy AG, Kubicek CP: The first 100 Trichoderma species characterized by molecular data.
Mycoscience 2006, 47:55–64.
2 Tamura K, Peterson D, Peterson N, Stecher G, Nei M, and Kumar S: MEGA 5 Molecular Evolutionary Genetics
Analysis using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods. Molecular Biology
and Evolution 2011, 28: 2731-2739.
3 Golobolt P, Farris JS, Nixon K: Review of TNT – Tree Analysis Using new tehnology Version 1.0. Cladistics 2004, 20:
378-383.
94
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Variations of ligand binding affinity upon protein conformational diversity
Ezequiel Juritz and Alexander Monzón
Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bs. As. Argentina
Background
As our previous studies shows (Juritz, Maria Silvina Fornasari, et al. 2012; Juritz, Palopoli, et al.
2012), protein conformational diversity should be taken into account when performing computational
biology methods that require as input a protein structure. We support the idea that computational
biology methods should consider protein native state as a population of structural conformers in
equilibrium and not as a rigid arrangement of atoms in order to accomplish more accurate results. A
protein crystallographic structure should be considered as an instance of the protein structural
dynamism, and different crystallographic structures of the same protein can be considered as different
structural conformers.
Description
In the present work we evaluate how different conformers of a same protein present diverse
behavior when docking methods are performed upon a given ligand. The docking method used was
AutoDock Vina (Trott and Olson 2010), and the protein structures were obtained from CoDNaS database
(from Conformational Diversity of the Native State)(Monzón, Juritz and Parisi, in preparation). CoDNaS
database contains the redundant collection of crystallographic structures of 9,474 proteins, accounting a
total of 40,565 structures, representing putative conformers for each corresponding protein. We
performed docking methods evaluating both natural and non-natural ligands for each protein.
Conclusion
Our preliminary results indicate that the difference on binding affinity between conformers is
significant, reaching values greater than 5.0 kcal/mol. Interestingly, binding affinity variation does not
correlate with the RMSD value between the structural conformers.
Juritz, Ezequiel Iván, Maria Silvina Fornasari, et al. 2012. “On the effect of protein conformation diversity in
discriminating among neutral and disease related single amino acid substitutions.” BMC Genomics 13(Suppl
4): S5. http://www.biomedcentral.com/1471-2164/13/S4/S5 (Accessed June 28, 2012).
Juritz, Ezequiel Iván, Nicolás Palopoli, et al. 2012. “Protein conformational diversity modulates protein divergence.”
Mol Biol Evol.
Trott, O., and A. J. Olson. 2010. “AutoDock Vina: improving the speed and accuracy of docking with a new scoring
function, efficient optimization, and multithreading.” J Comput Chem. 31(2): 455-61.
95
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Metabolic pathfinding based on genetic algorithms
Matias Gerard1,2 , Georgina Stegmayer1 and Diego Milone2
1
2
CIDISI-UTN-FRSF, CONICET, Lavaise 610 - Santa Fe (Argentina)
SINC(I)-FICH-UNL, CONICET, Ciudad Universitaria - Santa Fe (Argentina)
Background
Metabolic pathway searching consists of finding a set of reactions allowing to transform a compound into another one.
There are several search methods based on classical algorithms like breath-first search (BFS) [1] and depth-first search
(DFS) [2] to perform this task. However, there are problems in which a very high number of solutions must be explored,
making classical methods practically inapplicable. Genetic algorithms use stochastic search to explore multiple points of
the search space at the same time.
Material and methods
We propose a genetic algorithm (EAMP) that perform a two-end metabolic pathway search and compare its performance
with two classical search algorithms. To achieve this, the chromosomes were built by attaching a reaction to each gene, and
the left-to-right sequence of genes encoded a metabolic pathway. A initialization strategy to build variable size chromosomes
with a partially conserved sequentiality of reactions was proposed. The crossover and mutation operators were designed
to promote the building of a reaction chain. Fitness function was built to consider the validity of the reactions sequence,
the presence of the compounds to relate and the occurrence of repeated reactions.
Results
EAMP was studied for several mutation rates and different initialization strategies. Results indicate that minimum searching time was reached for a mutation rate of 0.04 and the initialization strategy with initial variable size for the chromosomes.
Comparison of EAMP with BFS and DFS is shown on Figure 1. Boxplots correspond to searching time and number of
reactions of 120 pathways founds with each algorithm. DFS perform the search with minimum search time but produce
solutions with maximum number of reactions allowed. BFS found shortest pathways but employ greater time than EAMP.
The genetic algorithm perform the search using an intermediate time to BFS and DFS, and not only found the shortest
pathways but also solutions with greater number of reactions linking the two compounds.
A
B
Fructose and Manose Metabolism
C01019
R03161
C02985
R01951
C00325
Pyruvate Metabolism
R00212
C00058
Glycine, Serine and Threonine Metabolism
C00022
C00022
R00221
C00740
R00214
R03163
C01721
R00220
C00149
R03241
C01099
C00033
R02262
R00589
R00703
R00319
C00186
R01450
C00256
C00078
R02722
C00065
R00582
C01005
R00588
R0 14 46
R00364
C03979
R02261
C00424
C00424
R02260
C00546
R01016
C00111
C00188
R00751
C00048
C00037
R00372
Figure 1. (A) Boxplots for searching time and number of reactions for EAMP, BFS and DFS algorithms. Searching
times and the number of reactions are shown in in white and gray, respectively. Time is plotted in logarithmic scale. (B)
Metabolic pathway linking C01019 and C00037. Pathway found is shown in bold line.
Conclusions
The proposed genetic algorithm found metabolic pathways using intermediate times to those required by BFS and DFS.
Moreover, it builds metabolic pathways with variable size, including either shortest pathways and larger solutions. It is
interesting from a biological viewpoint because pathways larger than shortest path could be provide relevant information
about alternative metabolic pathways.
References
1. Ogata, H., Goto, S., Fujibuchi, W., Kanehisa, M.: Computation with the KEGG pathway database. BioSystems 47 (1998)
119–128
2. Faust, K., Croes, D., van Helden, J.: Metabolic Pathfinding Using RPAIR Annotation. Journal of Molecular Biology 388(2)
(2009) 390–414
96
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Conformational diversity and evolutionary rates in proteins
Diego Javier Zea1, Maria Silvina Fornasari1, Cristina Marino Buslje2 & Gustavo Parisi1
1 Structural Bioinformatics Group, Universidad Nacional de Quilmes, Bernal, Buenos Aires, Argentina
2 Structural Bioinformatics Unit, Fundación Instituto Leloir, Capital Federal, Buenos Aires, Argentina
Introduction
Several factors have been associated to the modulation of the evolutionary rate. Gene expression
level is one of the strongest and consistent correlation between genomic data and evolutionary rate[1].
Recent findings indicate that structure-functional features and translation rates could have comparable
contributions to explain evolutionary rates[2]. Most of these studies have been done describing the native
state of a protein with a single structure. However, it is well established that native state of proteins are
better represented by an ensemble of different conformers in dynamic equilibrium[3]. In this work we
study how the presence of conformational diversity could influence the rate of evolution.
Methods
To study this relationship we used a major update of the PCDB database (Protein Conformational Data
Base)[4]. This database contains almost 8000 proteins with different degrees of structural diversity
measured as the maximum root-mean-square deviation (RMSD) found between the different conformers
for each protein. The RMSD was normalized for structural alignment length.[5] Each of these proteins
was linked to OMA [6] for estimate dN (number of non-synonymous substitutions per non-synonymous
site) as a measure of evolutionary rate using PAML 4 [7]. We used CATH database [8] for analysis of
domains in a given protein. Domains were selected, for further dN estimation, using clustal omega [9]
protein alignments.
Results and Conclusions
We found a negative correlation between dN (for orthologs between mouse and human) and the
maximum RMSD for alpha carbons between protein conformers (Spearman rank correlation with a rho of
-0.34 and a p-value less than 5 percent) for mono domain proteins in humans. This rho was tested with
a bootstrap, the interval of confidence at the 95% level goes from -0.5 to -0.1. Our results indicate that
conformational diversity have an important role modulating protein evolutionary rates. We think that our
findings could have important implications in the understanding of protein evolution process.
References
1. Drummond DA, Bloom JD, Adami C, Wilke CO, Arnold FH: Why highly expressed proteins evolve slowly. Proceedings of the
National Academy of Sciences of the United States of America 2005, 102:14338-43.
2. Wolf MY, Wolf YI, Koonin EV: Comparable contributions of structural-functional constraints and expression level to the
rate of protein sequence evolution. Biology direct 2008, 3:40.
3. Tsai C-jung, Ma B, Nussinov R: Commentary Folding and binding cascades : Shifts in energy landscapes. 1999, 96:99709972.
4. Juritz EI, Alberti SF, Parisi GD: PCDB: a database of protein conformational diversity. Nucleic acids research 2011, 39:D4759.
5. Carugo O: A normalized root-mean-square distance for comparing protein three-dimensional structures. 2001:1470-1473.
6. Altenhoff AM, Schneider A, Gonnet GH, Dessimoz C: OMA 2011: orthology inference among 1000 complete genomes.
Nucleic acids research 2011, 39:D289-94.
7. Yang Z: PAML 4: phylogenetic analysis by maximum likelihood. Molecular biology and evolution 2007, 24:1586-91.
8. Orengo C a, Pearl FM, Bray JE, Todd a E, Martin a C, Lo Conte L, Thornton JM: The CATH Database provides insights into
protein structure/function relationships. Nucleic acids research 1999, 27:275-9.
9. Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Söding J, Thompson JD, Higgins
DG: Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Molecular systems
biology 2011, 7:539.
97
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Effect of the o-glicosilation in the binding of Extensins to
Peroxidases.
A.Aptekmann|(a,b), JS Salter(b), J Estevez(b), A Nadra(a)
(a) Departamento de Química Biológica FCEN, UBA
(b) IFIBYNE CONICET-UBA
The classical vegetal peroxidases (PERs)that contain an heme group are related to
a wide number of roles as in lignification, hormonal signaling, development and
ROS protection. In Arabidopsis thaliana 73 apoplastic PERs of type III have been
described (1). Recently we have discoverede that some mutants for PERs as PER73
have phenotype similar to that of the mutants for some cell wall glicoprotein:
extensins (EXTs), sugesting some degree of substrate specificity(2). We hipothesize
that some PERs, including PER73, catalize the crosslinking of O-glicoproteins (in
particular EXTs) and that such process is influences by the o-glicosilation status of
those same EXTs. In order to test this hypothesis, we have proposed adressing the
modelling of PERs and their possible ligand EXTs.In the present work we describe
the obtention of the structure of PER73 bindind diferent EXTs with distinct Oglicosilations. The structures have been modelled by homology using as a template
the PER2 and the Horseradish Peroxidase (PDB ID : 1PA2 and 1H57) and a
colagen structure(the poliprolin kind), followed by an energy minimization after
wich we used this structures to do docking.
As ligands for this docking we used EXTs with O-glicosilations as those found in
wild type arabidopsis and sub O-glicosilated variants similar to those found in the
experiments with mutants for the O-glicosilation(3) path.
Those EXTs to be found as the more likely putative substrate by this analisis will be
evaluated in vivo and in vitro.
(1). Computational analyses and annotations of the Arabidopsis peroxidase gene family
(2) “Modelado de la peroxidasa 73 y su especificidad por extensinas” 2011 SAB. A.A.Aptekmann J.S. Salter J.M. Velazquez.
J.M. Estevez. A.D. Nadra
(3) “Essential role of O-glycosylated plant cell wall extensins for polarized root hair growth”. 2011. Science 332, 1401-1403.
S.M. Velasquez et al. Nadra & J. M. Estevez.
98
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Plant small heat shock proteins during different stress conditions other than heat. Comparative
analysis between Arabidopsis thaliana and Solanum lycopersicum
1
2
2
2
Débora Pamela Arce , Martin Damián Ré , Silvana Beatriz Boggio and Estela Marta Valle
1
Facultad Regional San Nicolás, Universidad Tecnológica Nacional, San Nicolás, 2900, Buenos Aires,
Argentina
2
Instituto de Biología Molecular y Celular de Rosario (IBR-CONICET), Universidad Nacional de Rosario,
S2002LRK, Rosario Argentina
Background
Small heat shock proteins (sHSPs) are chaperones that play an important role in abiotic and biotic stress
tolerance. The special importance of sHSPs in plants is suggested by their unusual abundance and they
1
are found in different cellular compartments . Furthermore, some sHSPs are also expressed during certain
2
stages of development . In this study Arabidopsis thaliana and Solanum lycopersicum were used as model
plants. Previous findings allowed us to identify heat shock proteins (HSPs), heat shock factors (Hsfs) and
3
sHSPs genes up-regulated under oxidative stress mediated by methyl viologen (MV) in Arabidopsis . In
4
addition, tomato sHSPs were induced in red fruit compared to green fruit . All these results allowed us to
perform an in silico strategy for analyzing the regulation of sHSPs gene expression. Three sHSPs
promoter sequences from mitochondrial LeHsp23.8-M, cytosolic LeHsp17.7-CI and cytosolic LeHsp17.4CII were analyzed.
Materials and Methods
In the present work we made a screening of the following databases: Sol genomics network
http://www.sgn.cornell.edu/], Tomato EST Database http://ted.bti.cornell.edu/], NCBI Database
http://www.ncbi.nlm.nih.gov/]. The Arabidopsis Information Resource (TAIR) [http://arabidopsis.org/].
5
Sequence analyses were performed using bioinformatics tools BLASTn and T-Coffee . For promoter
analysis, 1.9 kb upstream regions of sHsp genes were identified by BLASTN in NCBI database and in Sol
genomics network (SGN Combined WGS-BAC-unigenes set). Heat shock elements (HSEs) were
searched in the PLACE database [http://www.dna.affrc.go.jp/PLACE/index.html]. Further analysis to
identify conserved motifs was performed using expectation maximization method MEME
[http://meme.sdsc.edu/meme4/intro.html], PlantCARE database
6
[http://bioinformatics.psb.ugent.be/webtools/plantcare/html/] and MatInspector program .
Results
The analysis of 1.9 kb promoter region of sHsp genes using bioinformatics tools (see Materials and
Methods) showed that heat shock elements or HSEs (CCAAT Box) were present in the three sequences
analyzed. Other motifs also detected were a related sequence of abscisic acid response element (ABRE)
and ethylene responsive elements (ERE).
Conclusions
These results indicate that LeHsp23.8-M, LeHsp17.7-CI and LeHsp17.4-CII could be involved in different
processes mediated by some plant hormones (abscisic acid or ethylene) other than heat stress. These
promoter sequences could be used in the generation of tomato transgenic plants to evaluate the
LeHsp23.8-M, LeHsp17.7-CI and LeHsp17.4-CII gene expression pattern under environmental stress
conditions or different developmental stages.
References
1. Wang W, Vinocur B, Shoseyov O, Altman A: Role of plant heat-shock proteins and molecular
chaperones in the abiotic stress response. Trends Plant Sci 2004, 9:244-252.
2. Prasinos C, Kampis K, Samakovli D, Hatzopoulos P: Tight regulation of expression of two
Arabidopsis cytosolic Hsp90 genes during embryo development. J Exp Bot 2005, 56:633-644.
3. Scarpeci TE, Zanor MI, Carrillo N, Mueller-Roeber B, Valle EM: Generation of superoxide anion in
chloroplasts of Arabidopsis thaliana during active photosynthesis: a focus on rapidly induced
genes. Plant Mol Biol 2008, 66(4):361-378
4. Re MD, Arce DP, Boggio SB: Expresión de sHsps luego de la conservación en frío de tomates (cv.
MICRO-TOM). V Jornadas argentinas de Biología y Tecnología Postcosecha, 2009.
5. Notredame C, Higgiens D, Heringa J: T-Coffee: A novel method for multiple sequence alignments.
J Mol Biol 2000, 302: 205-217.
6. Cartharius K, Frech K, Grote K, Klocke B, Haltmeier M, Klingenhoff A, Frisch M, Bayerlein M, Werner T:
MatInspector and beyond: promoter analysis based on transcription factor binding sites.
Bioinformatics 2005, 21(13):2933-42
99
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
GO Function predictions by True Path Rule
Pilar Bulacio∗1,2 , Flavio Spetale1 , Laura Angelone1,2 and Elizabeth Tapia1,2
1 Cifasis-Conicet
2 Facultad
Institute, Bv. 27 de Febrero 210 Bis, Rosario, Argentina
de Cs. Exactas e Ingenierı́a, Universidad Nacional de Rosario, Riobamba 245 Bis, Rosario, Argentina
Email: Pilar Bulacio∗ - [email protected];
∗ Corresponding
author
Hierarchical classification backgarund
Protein function prediction is an important problem in bioinformatics research. Useful tools have been
developed to identify similar sequences regarding their corresponding annotation database. But when no
similar sequences can be found, data mining techniques carefully designed may provide an important clue to
protein function prediction. In particular, hierarchical classification methods like True Path Rule (TPR) [1]
can take into account the relationship among protein functions defined on Gene Ontology (GO). This GO
structure influences in two points: i ) In the training set designs for machine-learned classifiers; and ii ) In the
global function prediction due to a sequence may belong to multiple classes.
Fig. 1 and Fig. 2 shows a simplified TPR analysis on GO with Arabidopsis data. Consensus probability p′
represent the membership of x sample to GO nodes. Positive p′ are in blue. The starting point to apply TPR
is the set of local probabilities p, positive p are in bold.
p'=0.587
GO:08150
p=0.6
p'=0.575
GO:08152
p=0.6
p'=0.65
GO:43170
p=0.4
GO:44238
p=0.5
GO:09058
p=0.4
p'=0.587
GO:08150
p=0.6
p'=0.575
GO:08152
p=0.6
p'=0.456
GO:09987
p=0.4
p'=0.456
p'=0.537
GO:44237
p=0.4
GO:06807
p=0.45
GO:44249
p=0.4
GO:34641
p=0.45
GO:09987
p=0.4
p'=0.5
GO:43170
p=0.4
GO:44238
p=0.5
p'=0.45
GO:09058
p=0.4
GO:44237
p=0.4
GO:06807
p=0.45
GO:44249
p=0.4
GO:34641
p=0.45
p'=0.625
GO:19538
p=0.4
GO:10467
p=0.4
GO:09059
p=0.4
GO:44267
p=0.4
GO:06412
p=0.4
GO:44260
p=0.4
GO:34645
p=0.4
p'=0.8
GO:06139
p=0.8
p'=0.4
GO:19538
p=0.4
GO:10467
p=0.4
GO:09059
p=0.4
GO:44267
p=0.4
GO:06412
p=0.4
GO:90304
p=0.4
Figure 1: Node GO:06139 with p = 0.8 entails x
belongs to six structured nodes
GO:44260
p=0.4
GO:34645
p=0.4
p'=0.5
GO:06139
p=0.55
GO:90304
p=0.4
Figure 2: Node GO:06139 with p = 0.55 entails x
belongs to two structured
Results and Conclusions
Focusing on global function predictions with TPR, on the GO taxonomy with Arabidopsis data, each node ni
estimates the probability that a sample x belongs to the class ci . Then, positive local predictions for a GO node
propagate from bottom to top (influence its ancestors) while negative ones are propagated to the descendant
(influence its offspring) to achieve the global consensus probability p′ . Note that the strength of local evidences
(p in GO:06139) may change the consensus probability p′ therefore, the positive paths (see Fig. 1, Fig. 2).
References
1. Valentini G: True Path Rule Hierarchical Ensembles for Genome-Wide Gene Function Prediction.
IEEE/ACM Transactions on Computational Biology and Bioinformatics 2011, 8:832–847.
100
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Fitting a mathematical expression the effect of two herbicides (paraquat and glyphosate)
on the population dynamics of Beijerinckia mobilis in soils planted with soybean
Tucumán, Argentina
Alberto Manlla, Melisa Apud Reinhold y Gladys Contino
Facultad de Agronomía y Zootecnia, Universidad Nacional de Tucumán, 4000 San Miguel de
Tucumán, Argentina
Background
The activity of the rhizosphere microorganisms convert the atmospheric nitrogen in nitrogen
used by plants (NH2, NH4 or NO3). Among the free fixatives, the least known of the genere is
Beijerinckia, with B.mobilis counting as a native species. Agricultural practices in soybean crops
in the region are characterized by an intense demand for pesticides. The herbicides most used
are: paraquat and glyphosate.
Materials and methods
The variables `time since the application of herbicides´ and `the number of most frequent
microorganisms´ found in soil samples from plots of 200 cm², upon which herbicides were
distributed at random over a soybean crop in 2010. Regression analysis was applied to the
variables.
Results
The summary of the experimental data, transformed logarithmic scale are presented in a scatter
diagram (Figure 1) which allows to infer the mathematical models that best fit the test and
determine analytically the elements that characterize these relationships (Table 1).
Tabla 1: Determinación analítica de los principales elementos
Figura 1: Representación de las variables transformadas
Paraquat
a) Ordenada al Origen: valor de Y cuando `X = 0´ o cuando
Glifosato
la parábola corta el `eje Y´ o punto (0, c)
9.00
8.00
Log 10 (NMP)
c=
y = 2.335x2 - 7.123x + 8.535
R² = 0.860
8.535
6.731
7.00
b) Raices o Cero de la Función: no hay raices por que el
6.00
el discriminante (b² - 4ac) es negativo ( < 0 )
5.00
b² - 4 a c =
-28.98
-14.28
4.00
b² =
50.74
32.51
4ac=
79.72
46.79
3.00
2.00
c) Extremos o coordenadas del vértice de la parábola:
y = 1.738x2 - 5.702x + 6.731
R² = 0.718
1.00
0.00
0.00
0.50
1.00
1.50
2.00
2.50
Log 10 (Tiempo en Horas)
3.00
3.50
x = -b / 2a =
1.525
1.640
y(x) =
3.103
2.054
5.43
4.68
término cuadrático
término lineal
10.86
9.35
término independiente
8.535
6.731
Conclusions
The herbicides affected B.mobilis similarly but with different intensity. Paraquat caused the least
harmful effect, allowing the phace recovery of the original levels of the population.
Among other factors, the microbial growth depends upon the starting point of herbicide
degradated (it is possible that subproducts of such process serve as nutrients or stimulating
recovery of the population), upon weather (after 720 hours of herbicides applications rain and
low temperatures might have mitigated the harmful effects on microorganisms) and upon soil
(trial plots had difficulty infiltrating rain, so the superficial runoff might attenuated the herbicide
effect).
Reference
1. Mayz Figueroa J. Fijación Biológica de Nitrógeno. Revista científica UDO Agrícola. Vol 4. 1
- 20 Pág. Universidad de Oriente. Maturín, Estado Monagas 2004.
2. Olivares JP. Fijación Biológica de Nitrógeno. Estación Experimental del Zaidin, CSIC,
Granada, España 2008.
3. Lourival Larini. Toxicología Dos Praguicidas. Editora Manole Ltda. San Pablo. Brasil. 1999.
101
Índice alfabético
Acosta, M. G., 13, 15
Adur, J. F., 15
Agüero, F., 11
Ahumada, M. A., 13
Alibes, A., 12
Amadı́o, A., 13
Amerio, N., 11, 16
Andón, N., 15
Andreatta, M., 8
Angel, S., 11
Añón, M. C., 11
Aptekmann, A., 12
Arab Cohen, D., 13
Arce, A. L., 9
Areces, C., 14
Arévalo, I., 15
Arranz Amo, J. A., 16
Astorga, M., 12
Ballarin, V., 9, 13
Balzarini, M. G., 16
Bartó, C., 14
Basanta, B., 12
Belaich, M., 16
Berenstein, A. J., 9
Bessone, V., 15
Bianchi, M., 15
Biset, G., 15
Bondino, H. G., 10
Braunstein, L., 14
Briñón, M. C., 8
Brondino, C. D., 15
Brun, M., 9, 10, 13
Bugnon, L., 15
Bukowski Loináz, M. B., 15
Bustamante, J. P., 11
Bustos, D., 14
Cabral, J. B., 13
Candreva, A., 16
Capella, M., 9
Caramelo, J. J., 9
Carbonetto, B., 13
Carisimo, D., 14
Carmona, S., 11
Carrari, F., 11
Casco, V. H., 10, 13, 15
Castrillo, L., 11, 16
Chan, R. L., 9
Chemes, L. B., 12
Chernomoretz, A., 9
Churio, M., 14
Clavijo, B., 15
Cossi, P., 15
Cossio, M., 11, 12, 16
Costa, J. G., 15
Couto, P. M., 9
Cuadra, N., 15
Dalmasso, M. C., 11
Dalosto, S. D., 15
de Prat-Gay, G., 12
Debat, H., 14
Defelipe, L., 12, 16
Defelipe, L. A., 15
Defeudis, L., 14
Dı́az, M., 11
Diaz-Zamboni, J. E., 15
Docena, G., 16
Dodelson de Kremer, R., 16
Ducasse, D., 14
Duffy, T., 11
Dumas, V. G., 12
Edelstein, J., 14
Eguaras, M., 14
Elgoyhen, B., 14
Embnet Node Argentina, , 16
Erbes, L., 15
Espada, R., 9
Espinosa, M. B., 10
Estevez, J., 12
Estrin, D., 11
Faccendini, P. L., 15
Farber, M., 15
Farı́as, M. E., 11
Fazzi, L., 15
Fernández Alberti, S., 12
Fernández Feijóo, M. E., 10
Fernandez, E., 13, 16
Fernandez, P., 15
Ferreiro, D. U., 9
Ferro, S., 8
Ferroni, F. M., 15
Firmenich, V. E., 10
Fonseca, M. I., 11
Fornasari, M. S., 9, 14
Franchini, L., 14
102
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Fresno, C., 13, 16
Fuselli, A., 12
Galetto, C. D., 10, 15
Garavaglia, M., 16
Garay, S., 12
Garay, S. A., 12
Garcı́a, M. A., 13
Gauto, D., 15
Gende, L., 14
Gerard, M., 8
Ghiringhelli, D., 16
Giménez Pecci, M. D. L. P., 13
Girotti, M. R., 16
Glavina, J., 11, 12
Gómez, G. E., 9
Gómez, M. C., 15
Gonzalez, G., 16
Gonzalez, S., 15
Gonzalo Parra, R., 9
Grabiele, M., 14
Gutson, D., 14–16
Hajos, S., 15
Hasenahuer, M. A., 10
Herrera, F. E., 12
Herrero, F., 16
Ibañez, I., 9
Iserte, J., 16
Izaguirre, M. F., 10, 15
Juárez, L., 13
Juri Ayub, M., 10
Juritz, E., 14
Juritz, E. I., 12
Kelmansky, D. M., 8
Kondrasky, A., 14
Labadie, G., 13
Lagier, C. M., 15
Laguna, I. G., 13
Lamberti, P., 11
Landolfo, L., 9
Lanzarotti, E., 16
Lapadula, W., 10
Laróvere, L. E., 16
Lassaga, S. L., 13
Laugero, S. J., 15
Llera, A., 13, 16
López Medus, M., 9
López, J. A., 16
Lufrano, D., 16
Luna, M. C., 15
Lund, O., 8
Macri, P., 14
Mancini, E., 11, 13
Marcipar, I. S., 15
Marino Buslje, C., 9, 14, 16
Martı́, D., 14
Martı́, M., 12, 15
Marti, M., 9, 11, 15, 16
Martinez, M. J., 11
Martino, D., 12
Maurino, F., 13
Menzaque, F. E., 16
Merino, G., 13, 16
Miele, S., 16
Migueles, M., 14
Milone, D., 8
Mishima, J., 13
Modenutti, C., 15
Molina, M., 12
Monzón, A., 12
Monzon, A., 14
Moscone, E., 14
Nadra, A., 12
Nardo, A. E., 11
Navas, L., 13
Nielsen, M., 8, 16
Ojeda, S., 15
Oliva, P., 16
Pagnuco, I., 9, 13
Pagnuco, I. A., 10
Pallarol, M., 13
Palomba, D., 11
Palopoli, N., 11
Paniego, N., 15
Paravani, E. V., 15
Parisi, G., 9, 11, 12, 14, 16
Perona, G., 11, 12
Perona, M., 12
Petruccelli, S., 16
Petruk, A., 12
Pisciottano, F., 14
Podhajcer, O., 16
Podhajcer, O. L., 16
Ponzoni, I., 11
Porta, E., 13
Prada, F., 16
Prato, L., 13, 16
Pury, P., 16
Quaranta, J. F., 12
Quevedo, M. A., 8
Rabinovich, D., 15, 16
Radusky, L., 15, 16
Radusky, L. G., 15
Ramı́rez, M. J., 15
Ramos, L., 16
Rascován, N., 11
Rascovan, N., 13
103
3er Congreso Argentino de Bioinformática y Biologı́a Computacional
Ré, D., 9
Ré, M., 11
Reinert, M., 13
Remon, L., 13
Revale, S., 11, 13
Revuelta, M. V., 9, 10
Riberi, F., 15
Ribero, G., 13
Rivarola, M., 15
Rizzi, A. C., 15
Robledo, G., 14
Rodrigues, D., 12
Rodrigues, D. E., 12
Saavedra Fresia, C. E., 16
Sales, M. D. L. M., 12
Samoluk, S., 14
Sanchez, I., 12
Sánchez, I. E., 11, 12
Sanchez-Puerta, M. V., 10
Santa Maria, C., 14
Scaldaferro, M., 14
Seijo, G., 14
Semrik, M., 13
Serrano, L., 12
Sferco, S. S., 15
Silvera Ruiz, S. M., 16
Simonetti, F. L., 16
Sioli, G., 11, 12, 16
Soria, M., 14
Soto, A., 11
Stegmayer, G., 8
Taleisnik, S., 13
Tardivo, L., 15
ten Have, A., 9, 10
Trumper, E., 14
Turjanski, A., 12, 16
Turjanski, A. G., 15
Uhart, M., 14
Vairo Cavalli, S., 16
Valacco, M. P., 16
Vazquez, G. E., 11
Vázquez, M., 11
Vazquez, M., 13
Vera, C. H., 13
Villalba, L., 11, 16
Villoria, L., 13
Zandomeni, R., 13
Zapata, P., 11, 16
Zea, D., 9, 14
Zimicz, C., 15
Zingaretti, L., 16
104