Skip to content

Commit 79c0f90

Browse files
committedMay 31, 2016
update doc
1 parent 1071c37 commit 79c0f90

File tree

3 files changed

+50
-48
lines changed

3 files changed

+50
-48
lines changed
 

‎doc/source/references.rst

-7
Original file line numberDiff line numberDiff line change
@@ -30,13 +30,6 @@ The ANOVA report
3030
:members:
3131
:undoc-members:
3232

33-
GDSC report
34-
~~~~~~~~~~~~~~~~~
35-
36-
.. automodule:: gdsctools.gdsc
37-
:members:
38-
:undoc-members:
39-
4033
Statistical Tools
4134
-------------------
4235
.. automodule:: gdsctools.stats

‎gdsctools/gdsc.py

+49-40
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ class IC50Cluster(IC50):
2828
account for this feature, the IC50Cluster will rename them columns and
2929
transforming the data as follows.
3030
31-
Consider the case of the DRUG 1211. It appears 3 times in the original
31+
Consider the case of the DRUG 1211. It appears 3 times in the original
3232
data::
3333
3434
Drug_1211_0.15625_IC50
@@ -67,8 +67,8 @@ class IC50Cluster(IC50):
6767
several concentrations is large, then they are studied independently.
6868
Otherwise they are merged.
6969
70-
In the final dataframe, the columns names are transformed into unique
71-
identifiers like in the IC50 class by removing the ``Drug_`` prefix and
70+
In the final dataframe, the columns names are transformed into unique
71+
identifiers like in the IC50 class by removing the ``Drug_`` prefix and
7272
````_conc_IC50`` suffix.
7373
7474
The :attr:`mapping` contains the mapping between new and old identifiers.
@@ -81,7 +81,7 @@ def __init__(self, ic50, ratio_threshold=10, verbose=True, cluster=True):
8181
:param ic50:
8282
:param int ratio_threshold:
8383
:param bool verbose:
84-
:param bool cluster: may be useful to not cluster the data for
84+
:param bool cluster: may be useful to not cluster the data for
8585
testing or debugging
8686
8787
"""
@@ -224,54 +224,60 @@ def mkdir(self, name):
224224

225225

226226
class GDSC(GDSCBase):
227-
"""Wrapper of the :class:`~gdcstools.anova.ANOVA` class and reports to
228-
analyse all TCGA Tissues and companies automatically.
227+
"""Wrapper of the :class:`~gdcstools.anova.ANOVA` class and reports to
228+
analyse all TCGA Tissues and companies automatically while creating summary
229+
HTML pages.
229230
230-
First, one need to provide the unique IC50 files. Second, the DRugDecode
231-
file (see :class:``) must be provided to convert identifiers into
232-
drug names within the reports. Third, genomic feature files must be
233-
provided for each tissue.
231+
First, one need to provide an unique IC50 file. Second, the DrugDecode
232+
file (see :class:`~gdsctools.readers.DrugDecode`) must be provided
233+
with the DRUG identifiers and their corresponding names. Third,
234+
a set of genomic feature files must be provided for each :term:`TCGA`
235+
tissue.
234236
235-
First, create all main analysis that include all drugs::
236237
238+
You then create a GDSC instance::
239+
240+
from gdsctools import GDSC
237241
gg = GDSC('IC50_v18.csv', 'DRUG_DECODE.txt',
238242
genomic_feature_pattern='GF*csv')
239243
240-
Then run the analysis. This will launch an ANOVA analysis for each
241-
tissue as well as a dedicated HTML report for each tissue considered.
244+
At that stage you may want to change the settings, e.g::
242245
243-
This may take lots of time. On v18, on an i7 core using 1 CPU
244-
this takes about 1 hour.30 minutes
246+
gg.settings.FDR_threshold = 20
245247
246-
You should now have a directory called **tissue_packages** with about
247-
20 directories for each TCGA GF file. Keep that in a safe place or
248-
you will have to restart the analysis
248+
Then run the analysis::
249249
250-
Second, split those data just created for each specific proprietary
251-
compounds. For instance::
250+
gg.analysis()
252251
253-
gg.create_data_packages_for_companies(['AZ'])
252+
This will launch an ANOVA analysis for each TCGA tissue + PANCAN case
253+
if provided. This will also create a data package for each tissue.
254+
The data packages are stored in ./tissue_packages directory.
254255
255-
or for all in one go::
256+
Since all private and public drugs are stored together, the next step is
257+
to create data packages for each company::
256258
257259
gg.create_data_packages_for_companies()
258260
259-
Third, create some summary pages::
261+
you may select a specific one if you wish::
260262
261-
gg.create_summary_pages()
263+
gg.create_data_packages_for_companies(['AZ'])
262264
263-
The last step is fast (a few seconds) and create index.html in the
264-
tissue_package directory and each proprietary directory.
265+
Finally, create some summary pages::
265266
267+
gg.create_summary_pages()
266268
269+
You entry point is an HTML file called **index.html**
267270
"""
268271
def __init__(self, ic50, drug_decode,
269272
genomic_feature_pattern="GF_*csv",
270273
main_directory="tissue_packages", verbose=True):
271-
"""
274+
""".. rubric:: Constructor
275+
276+
:param ic50: an :class:`~gdsctools.readers.IC50` file.
277+
:param drug_decode: an :class:`~gdsctools.readers.DrugDecode` file.
278+
:param genomic_feature_pattern: a glob to a set of
279+
:class:`~gdsctools.readers.GenomicFeature` files.
272280
273-
ic50 must be a filename (not IC50 instance) because it will be used for
274-
each genomic features file
275281
"""
276282
super(GDSC, self).__init__(genomic_feature_pattern, verbose=verbose)
277283
assert isinstance(ic50, str)
@@ -294,7 +300,7 @@ def __init__(self, ic50, drug_decode,
294300
# quick test on 15 features
295301
self.test = False
296302

297-
def analyse(self, onweb=False, multicore=None):
303+
def analyse(self, multicore=None):
298304
"""Launch ANOVA analysis and creating data package for each tissue.
299305
300306
:param bool onweb: By default, reports are created
@@ -306,9 +312,9 @@ def analyse(self, onweb=False, multicore=None):
306312
self.mkdir(self.main_directory)
307313
# First analyse all TCGA cases + PANCAN once for all and
308314
# store all the results in a dictionary.
309-
self._analyse_all(onweb=onweb, multicore=multicore)
315+
self._analyse_all(multicore=multicore)
310316

311-
def _analyse_all(self, onweb, multicore=None):
317+
def _analyse_all(self, multicore=None):
312318
for gf_filename in sorted(self.gf_filenames):
313319
tcga = gf_filename.split("_")[1].split('.')[0]
314320
print(purple('======================== Analysing %s data' % tcga))
@@ -340,9 +346,11 @@ def _analyse_all(self, onweb, multicore=None):
340346
self.report = ANOVAReport(an)
341347
self.report.settings.savefig = True
342348

343-
self.report.create_html_pages(onweb=onweb)
349+
self.report.create_html_pages(onweb=False)
344350

345351
def create_data_packages_for_companies(self, companies=None):
352+
"""Creates a data package for each company found in the DrugDecode file
353+
"""
346354
##########################################################
347355
#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!#
348356
# #
@@ -429,10 +437,10 @@ def drug_to_keep(drug):
429437
an.settings.analysis_type = tcga
430438

431439
# Now we create the report
432-
self.report = ANOVAReport(an, results,
440+
self.report = ANOVAReport(an, results,
433441
drug_decode=drug_decode_company,
434442
verbose=self.verbose)
435-
self.report.company = company
443+
self.report.company = company
436444
self.report.settings.analysis_type = tcga
437445
self.report.create_html_main(False)
438446
self.report.create_html_manova(False)
@@ -449,20 +457,21 @@ def _get_companies(self):
449457
companies = property(_get_companies)
450458

451459
def create_summary_pages(self):
452-
"""
460+
"""Create summary pages
453461
454462
Once the main analyis is done (:meth:`analyse`), and the company
455463
packages have been created (:meth:`create_data_packages_for_companies`),
456464
you can run this method that will creade a summary HTML page
457465
(index.html) for the tissue, and a similar summary HTML page for the
458-
tissues of each company. Finally, an HTML summary page for the companies
459-
is also created.
466+
tissues of each company. Finally, an HTML summary page for the
467+
companies is also created.
460468
461469
The final tree direcorty looks like::
462470
463471
464472
|-- index.html
465473
|-- company_packages
474+
| |-- index.html
466475
| |-- Company1
467476
| | |-- Tissue1
468477
| | |-- Tissue2
@@ -472,9 +481,9 @@ def create_summary_pages(self):
472481
| | |-- Tissue2
473482
| | |-- index.html
474483
|-- tissue_packages
484+
| |-- index.html
475485
| |-- Tissue1
476486
| |-- Tissue2
477-
| |-- index.html
478487
479488
480489
"""
@@ -513,7 +522,7 @@ def _create_main_index(self):
513522
html_page.jinja['tissue_directory'] = self.main_directory
514523
html_page.write()
515524

516-
def _create_summary_pages(self, main_directory, verbose=True,
525+
def _create_summary_pages(self, main_directory, verbose=True,
517526
company=None):
518527
# Read all directories in tissue_packages
519528

‎gdsctools/readers.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
"""IO functionalities
1818
1919
20-
Provides readers to read
20+
Provides readers to read the following formats
2121
2222
- Matrix of IC50 data set :class:`IC50`
2323
- Matrix of Genomic features with :class:`GenomicFeatures`

0 commit comments

Comments
 (0)
Please sign in to comment.