Thursday, January 28, 2016

Create MongoDB service in Windows OS

1) Open Command Prompt

2) Create directory for Database and Logs
                   mkdir c:\data\db 
                   mkdir c:\data\log

3) Create a file at C:\mongodb\mongod.cfg and  specifies both systemLog Path and Storage database 
     path 

 Add following in mongod.cfg file :

systemLog: destination: file path: c:\data\log\mongod.log storage: dbPath: c:\data\db

4) Create service named as "MongoDB" using following command

sc.exe create MongoDB binPath= "C:\mongodb\bin\mongod.exe --service --config=\"C:\mongodb\mongod.cfg\"" DisplayName= "MongoDB" start= "auto"

5) To remove MongoDB service use following command

sc.exe delete MongoDB


 

Monday, September 14, 2015

utf8_general_ci VS utf8_unicode_ci what should we use?

These two collations are both for the UTF-8 character encoding. The differences are in how text is sorted and compared.

Note: in new versions of MySQL use utf8mb4, rather than utf8, which is the same UTF-8 data format with same performance but previously only accepted the first 65,536 Unicode characters.
  • Accuracy
    utf8mb4_unicode_ci is based on the Unicode standard for sorting and comparison, which sorts accurately in a very wide range of languages.
    utf8mb4_general_ci fails to implement all of the Unicode sorting rules, which will result in undesirable sorting in some situations, such as when using particular languages or characters.
  • Performance
    utf8mb4_general_ci is faster at comparisons and sorting, because it takes a bunch of performance-related shortcuts.
    On modern servers, this performance boost will be all but negligible. It was devised in a time when servers had a tiny fraction of the CPU performance of today's computers.
    utf8mb4_unicode_ci, which uses the Unicode rules for sorting and comparison, employs a fairly complex algorithm for correct sorting in a wide range of languages and when using a wide range of special characters. These rules need to take into account language-specific conventions; not everybody sorts their characters in what we would call 'alphabetical order'.
As far as Latin (ie "European") languages go, there is not much difference between the Unicode sorting and the simplified utf8mb4_general_ci sorting in MySQL, but there are still a few differences:
  • For examples, the Unicode collation sorts "ß" like "ss", and "Œ" like "OE" as people using those characters would normally want, whereas utf8mb4_general_ci sorts them as single characters (presumably like "s" and "e" respectively).
  • Some Unicode characters are defined as ignorable, which means they shouldn't count toward the sort order and the comparison should move on to the next character instead. utf8mb4_unicode_ci handles these properly.
In non-latin languages, such as Asian languages or languages with different alphabets, there may be a lot more differences between Unicode sorting and the simplified utf8mb4_general_ci sorting. The suitability of utf8mb4_general_ci will depend heavily on the language used. For some languages, it'll be quite inadequate.

What should you use?
There is almost certainly no reason to use utf8mb4_general_ci anymore, as we have left behind the point where CPU speed is low enough that the performance difference would be important. Your database will almost certainly be limited by other bottlenecks than this.
The difference in performance is only going to be measurable in extremely specialised situations, and if that's you, you probably already know about it. If you're experiencing slow sorting, in almost all cases it'll be an issue with your indexes/query plan. Changing your collation function should not be high on the list of things to troubleshoot.
In the past, some people recommended to use utf8mb4_general_ci except when accurate sorting was going to be important enough to justify the performance cost. Today, that performance cost has all but disappeared, and developers are treating internationalization more seriously.
One other thing I'll add is that even if you know your application only supports the English language, it may still need to deal with people's names, which can often contain characters used in other languages in which it is just as important to sort correctly. Using the Unicode rules for everything helps add peace of mind that the very smart Unicode people have worked very hard to make sorting work properly.

Query to show all tables and their collation of a Schema

SELECT TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME,
 COLUMN_NAME, COLLATION_NAME
FROM INFORMATION_SCHEMA.COLUMNS
    WHERE `TABLE_SCHEMA` = 'Schema_Name'


How to alter collation of columns of a table :-

alter table `dbname`.`tablename` convert to character 
 set utf8 collate utf8_unicode_ci;


Ref : 
http://stackoverflow.com/questions/766809/whats-the-difference-between-utf8-general-ci-and-utf8-unicode-ci

Friday, February 6, 2015

How to show records in same order where IDs are put into the IN() clause?

 
 
How to show records in same order where IDs are put into the IN() clause? 
 
 
1) SELECT id FROM table1 WHERE ... ORDER BY display_order, name

2) SELECT name, description, ... WHERE id IN ([id's from first])
 
The issue is that the second query does not return the results in the same order 
that the IDs are put into the IN() clause. 
 
Solution :- 

SELECT name, description, ...
FROM ...
WHERE id IN([ids, any order])
ORDER BY FIELD(id, [ids in order])

Monday, February 2, 2015

Is a BLOB converted using the current/default charset in MySQL?


Blob data used MySQL charset which is in my.ini

default-character-set=latin1

Suppose
  1. I have a table with a BLOB field.
  2. The charset of the table is UTF-8.
  3. I connect to the DB and "SET CHARACTER SET utf8".
  4. Then I save binary data into the field.
  5. Then I retrieve the data, and it's not what I saved (corrupt).
Because while retrieving data blob data used mysql default charset set.

Ref :
http://stackoverflow.com/questions/14734812/is-a-blob-converted-using-the-current-default-charset-in-mysql

Tuesday, January 6, 2015

Pentaho/R Integration Plugin


http://dekarlab.de/wp/?p=5



Pentaho R Integration Plugin is used to execute R code direct from Pentaho Data Integration (PDI).
R Script Plugin can be downloaded here: Downloads

Installation
R script plugin can be installed by performing the following steps:
1) Copy the plugin folder (RScriptPlugin) into the folder of your PDI installation:
data-integration\plugins\steps
2) Install R project from the site http://www.r-project.org/
3) Install rJava package in R by executing: install.packages(“rJava”)
4) Copy JRI library rJava/jri/jri.dll (windows) or rJava/jri/libjri.so (linux) to: data-integration/libswt/{your operation system}
5) Specify location of R using R_HOME environment variable. (like: C:\R\R-3.0.2).
6) For windows, it is also needed to put in PATH variable the path to R.dll, for example: C:\R\R-3.0.2\bin\{your operation system}
7) The plugin can be found under Sripting Folder in Design Tab:

Note: to check the installation press Help button in R plugin in Spoon.

Usage
We demonstrate the usage of “Execute R Script” plugin by solving the following task. In CSV file we have the values for two variables a and b. We should calculate the sum of these variables for every row in CSV file.
For this we create transformation, which contains three steps:
  • CSV Step to load input values
  • Execute R Script Step
  • Select Values Step
The transformation is demonstrated on the following picture:

CSV file has the following values:

R script contains two sections. The first one is used to define mapping between input fields in PDI (Pentaho Data Integration) and variables in R script. The second one is used to define mapping between output variables from R script and output fields in PDI.

We have two fields in PDI with names a and b. In R file we have two variables with names a and b. In R script we define the third variable c as a sum of a and b.
In R script there is one service variable OUTPUT of type list. Using this variable we can define output values. In our case the output is only variable c. For this variable we can define a new field in PDI. In our case this is c_res field. In Select Values step we show the values for three fields a, b, and c_res.

The result is demonstrated on the picture below.

For more complicated tasks it is recommended to define functions and use direct database connection in R script. For example, we use Run.R in PDI and call function MainScript, which is defined in MainScript.R.
/code/r/Run.R

library(RMySQL)
library(R.utils)

sourceDirectory(“/code/r/func/”);
ctx.date_from=’01.01.2000′;#this value can be transferred from PDI
ctx.date_to=’01.12.2010′;#this value can be transferred from PDI
OUTPUT<-list(“err_train”=0, “err_gen”=0);
OUTPUT<-MainScript(ctx);
/code/r/func/MainScript.R
MainScript<-function(ctx) {
con <-dbConnect(MySQL(), user=”root”, password=””, dbname=”lab”, host=”localhost”);
on.exit(dbDisconnect(con));
printf(“\nTime Period: %s-%s”, ctx.date_from, ctx.date_to)
flush.console()
#Select data from database
query = paste(“”,”select * from table where column between “);
query = paste(query, “str_to_date(‘”, ctx.date_from, “‘, ‘%d.%m.%Y’) and str_to_date(‘”, ctx.date_to, “‘, ‘%d.%m.%Y’)”, sep=””);
rs <- dbSendQuery(con, query);
result <- fetch(rs, n = -1)
printf(“\nSize: %g”, nrow(result))
dbDisconnect(con)
#Here the model can be evaluated and results can be saved in variables err_train and err_gen.
#results to PDI
OUTPUT<-list(“err_train”=err_train, “err_gen”=err_gen);
return(OUTPUT)
}

Wednesday, December 10, 2014

Create MongoDB service in Windows OS

1) Open Command Prompt 2) Create directory for Database and Logs                    mkdir c:\data\db                     mkdir c:\...