Keynote address
Name of Keynote speaker

Title of presentation

Asis Chattopadhyay, University of Calcutta

Dimension Reduction one way to big data analysis

Ashish Ghosh, ISI

Next Generation Internet and Data Science

Ashish SenGupta, ISI

Directional Statistics for High Volatility Big Data

Goutam Dutta,Indian Institute of Management Ahmedabad

Prescriptive Analytics with Optimization: Challenges for
Future

Anindya Sengupta, Fractal Analytics

To be announced

Sriram Raghavan, IBM research and CTO, IBM India/South Asia

To be announced

Debashish Banerjee,Deloitte

To be announced

Sudipta Sen,Partner, McKinsey and Co.

To be announced

Directional Statistics for High Volatility Big Data
Ashis SenGupta
Indian Statistical Institute, Kolkata, West Bengal, INDIA and
Augusta University, Augusta, Georgia, USA
In this era of emerging complex problems, both small and big data – linear and nonlinear, exhibit challenging characteristics which need to be carefully modelled. Marked presence of asymmetry, multimodality, high volatility, long and fat tails, nonlinear dependency, etc. are common features of contemporary data, including important applications in Reliability analysis. Notwithstanding pitfalls, ideas from several disciplines do enrich the contribution of the research work. Directional statistics is one such scientific “key technology” as can be exploited to address these problems elegantly. In this talk, we consider the problem of obtaining probability distributions for modelling high volatility. The work of Mandelbrot has shown the appropriateness of the stable families of distributions for high volatility. However, in general, these families do not possess any analytical closed form for their probability density functions. This leads to the complexity of inference involving the parameters of such distributions. We overcome this problem of modelling high volatility data by appealing to the area of probability distributions for directional data. A new family of possibly multimodal, asymmetric and heavytail distribution is presented. The usual fattail, Cauchy and t, distributions are encompassed by this family and it has even tails comparable to that of the stable family. The problem of estimation of the parameters for such distributions as mentioned above is taken up. We apply our results to several reallife examples including one from bankruptcy.
Next Generation Internet & Data Science
Ashish Ghosh
Indian Statistical Institute
Kolkata
The Internet used today can be described as a network of computers, which connects one user to others around the globe. Most of the usage and application of the “Internet of Computers” involves human intervention. The future of the Internet would be － a world where manual intervention for the objects on the network could be minimized and its functionalities would be automatic and smart. This internet would not only connect computers and smart phones; it would be a network of smart objects, the “Internet of Things”. These “things” would be smart enough to sense, process and decide a corresponding action, Examples include smart appliances (refrigerator, lights, air conditioners), traffic signals, smart body monitors, etc. The individual objects along with the network would collect process and exchange data strategically. This interconnected network along with all the smart objects working together in correspondence with each other form a larger “Cyber Physical System” (like smart cities, smart hospital, etc). A working CPS would generate tons of data, hence efficient processing and effective use of this data is very crucial. There will be data from everywhere like climate data, social network data, video data, medical data, scientific data, etc. Storing these data for analytics may not always be feasible and analyzing them in real time will also be too difficult. Traditional analysis tools are not well suited to capture the complete essence of this massive data. The volume, velocity and variety is too large for comprehensive analysis, and the range of potential correlations and relationships between disparate data sources are too great for any analyst to test all hypotheses and derive all the value buried in the data. Some algorithms already have good capability of letting computers do the heavy thinking for us in case of smaller data. But, we are striving for more to deal with large volumes of such data in a short time. Therefore, we need to revisit old algorithms from statistics, machine learning, data mining and big data analytics and improvise them to tame such big data. Major innovations in big data analytics are still to take place; but, it is believed that emergence of such novel analytics is to come in near future from various domains.
Prescriptive Analytics with Optimization: Challenges for Future
Goutam Dutta
Professor, Production and Quantitative Methods Area
IIM, Ahmedabad
Analytics is the science of interpretation, and communication of meaningful patterns in data. It is very important to identify the pattern of data in areas rich with recorded information; Analytics relies on the simultaneous application of statistics, computer science and operations research to quantify performance. Business Application of Analytics started in 1950s and continues today in several fields of management like pricing and revenue management, manufacturing planning, supply chain, project management, healthcare, military and many other systems in public and private systems.
Optimization of business processes demonstrates a distinctive advantages as it only demonstrates the pattern in the data and shows the relationship (descriptive), but also shows what needs to done (prescriptive). A firm needs to be bold to experiment and implement that change suggested by the model. Most of the research in optimization has avoided the most difficult part of the modelling process: solving real world problem and implementation of the model in the real world. While the research in algorithms have taken about ninety five percent of the efforts , only five percent of the people are possibly solving or trying to solve real practical problems. Future challenges lie with solving such practical problems and develop meaningful methodology out of it.
The talk will summarize the history of optimization for business problem solving from 1950s and discuss the challenges of analytics professional in a business setting.
Dimension Reduction one way to big data analysis
Asis Kumar Chattopadhyay
Department of Statistics, Calcutta University
For multivariate analysis with p(>1)variables the problem that often arises is the ambiguous nature of the correlation or covariance matrix. When p is moderately or very large it is generally difficult to identify the true nature of relationship among the variables as well as observations from the covariance or correlation matrix. Under such situations a very common way to simplify the matter is to reduce the dimension by considering only those variables (actual or derived) which are truly responsible for the overall variation in order to analyze the data.
Principal component analysis (PCA) is a very common dimension reduction procedure. PCA was invented in 1901 by Karl Pearson, as an analogue of the principal axis theorem in mechanics. It was later independently developed by Harold Hotelling. It was also discussed by several authors in different forms. But PCA has several limitations. Here the components are uncorrelated and not independent and only under Gaussianity, the components become independent. As a result PCA works better for Gaussian data. Further the components are difficult to interpret physically being mixture of several random variables.
More recently, Independent Component Analysis (ICA) has emerged as a strong competitor to PCA and factor analysis. ICA finds a set of source data that are mutually independent and not only uncorrelated like PCA. ICA was primarily developed for nonGaussian data in order to find independent components responsible for a larger part of the variation. ICA separates statistically independent original source data from an observed set of data mixtures.
ICA has been used for data analysis in different areas like signal processing, pattern recognition,
econometrics, Astrophysics etc. Andrew Back (RIKEN, Japan) and Andreas Weigend (L N Stern School of Business, New York) used ICA to explore whether it can give some indication of the underlying structure of the stock market. The target was to find interpretable factors of instantaneous stock returns. Such factors could include news, response to very large trades and unexplained noise.