$ pip install randomly
>>> import randomly
>>> import pandas as pd
>>> df=pd.read_table('example.txt', index_col=0)
>>> df.shape
(1459, 17287)
>>> model = randomly.Rm()
min_tp
transcripts expressed.min_genes_per_cell
genes.min_cells_per_cell
cells.>>> model.preprocess(df, min_tp=0,
min_genes_per_cell=0,
min_cells_per_gene=0,
refined=True)
Cell names are not unique. Cell names are reset
Run the refining function
refined=True
(as we have done before) then, as function indicates, you have to run the function model.refining()
afterwards:>>> model.refining(min_trans_per_gene=7)
1459 cells and 11389 genes
min_trans_per_gene
defines a cut-off on genes. In this example, min_trans_per_gene
equals 7. This parameter should be choosen such that the total number of remaining genes is bigger than at least 8000. In this example, we leave 11389 genes.>>> model.fit()
Preprocessed data is being used for fitting
>>> model.plot_mp()
If you want to save the plots generated by any of the Randomly functions, you can do it by adding the option path=folder/name.extension
, where extension could be: pdf, png, jpg, etc. For example:
>>> model.plot_mp(path='Data/my_marchenko.pdf')
>>> model.plot_statistics()
>>> df2 = model.return_cleaned(fdr=0.0001)
>>> df2.shape
(1459, 2590)
sample_variance
) shown in the y-axis of the previous plot. For instance, this allows you to select the top 1100 most signal-like genes. In this case, by selecting sample_variance
equals 350, we get 1101 genes:>>> df2 = model.return_cleaned(sample_variance=350)
>>> df2.shape
(1459, 1101)
If you want to save the denoised dataset in your computer, you can do it by adding the option path=folder/name.txt
. For example:
>>> df2 = model.return_cleaned(sample_variance=350, path='Data/my_denoised_data.txt')
Once you know the signal-like genes you want to use based on the previous step, Randomly allows you to visualize it using t-SNE:
>>> model.fit_tsne(sample_variance=350)
computing t-SNE, using Multicore t-SNE for 4 jobs
atribute embedding is updated with t-SNE coordinates
>>> model.plot()
>>> model.fit_hierarchical()
thrs
, you can select the clusters:>>> model.visual_hierarchy(thrs=100, value_range=2)
>>> model.plot(labels=model.labels_hierarchy, legend=True)
n_clusters
selects the number of clusters:>>> model.fit_kmeans(n_clusters=16)
>>> model.plot(labels=model.labels_kmeans)
model.get_gene_info()
that allows you to analyze your clusters to get biological insight. model.get_gene_info()
gives the expression profile of a gene or a list of genes across the different clusters. It does this by performing a violin plot and a ridge plot.labels=model.labels_hierarchy
, to visualize the gene 'INS':>>> model.get_gene_info(labels=model.labels_hierarchy, gene=['INS'], size=8)
If several genes are used, for example gene=['INS', 'PPY', 'PDFGRA']
, then the function gives the average of the genes in the list.
>>> model.plot(gene=['PPY'], size=4)
or the average of a set of genes:
>>> model.plot(gene=['DDIT3','HSPA5','HERPUD1'], size=4)
>>> model.plot(gene=('PPY','GCG','SST','INS','GHRL','CPA1','KRT19'
,'RGS5','PDGFRA','VWF','SDS','SOX10'))
The function model.get_cluster_info()
gives information about the clusters produced by the methods described previously:
genes = number
we can increase the list. For example, genes = 20
will give the top 20 genes.>>> model.get_cluster_info(labels=model.labels_hierarchy, cluster=1)
The cluster 1 has 49 cells
The top 10 highly expressed signal-like genes in this cluster are:
PPY
EEF1A1
GNAS
FTL
PCSK1N
CHGB
SST
CLU
SCG2
MALAT1
>>> model.plot(gene=['library'], size=4)
legend=True
. We already did this when we performed the hierarchical clustering.legend=[list of your names]
. For example:>>> my_legend=['Gamma', 'Ductal (CFTR+)', 'Beta', 'Activ.stellate', 'Acinar', 'Stellate (TIMP1+ & CD44+)',
'Quiesc. stellate', 'Ductal (MUC1+ & TFF1+)', 'Macrofage', 'Ductal (CD44+ & CFTR- )',
'Beta stressed', 'Alpha', 'Endothelial', 'Delta']
>>> model.plot(labels=model.labels_hierarchy, legend=my_legend, legendcol=3, size=6, points=7)
legendcol
sets how many columns you want to plot in the legend.size
sets the size of the plot.points
sets the size of the points.You can use the same legend for the get_gene_info()
function and get publication-ready plots. For example:
>>> model.get_gene_info(labels=model.labels_hierarchy, gene=['INS'], size=8, legend=my_legend)
If you are using a MAC OS with retina display, to increase the quality of your plots you should use this option at the beginning of your script:
>>> %config InlineBackend.figure_format = 'retina'
If you want to do custom preprocessing—for example, you don't want to use TPMs (transcripts per million); or you want to select your own genes or your own cells in a different way—you can avoid the preprocessing step and safely jump to The Modeling Section above, using:
>>> model.fit(df = df)