I discover interesting things: 2013

Sonntag, 29. Dezember 2013

Some intros to NoSQL databases

In general, I am a bit late for the whole NoSQL party, but nevertheless here are some interesting articles:

Samstag, 12. Oktober 2013

A bunch of questions about founding a start up comany

The whole internet seems to be overflowing with advice for start ups. Experts and industry veterans share a wide range of information on twitter and blogs, at meet-ups and conferences. They tend to focus on marketing, product development, metrics and financing. Much of this information is contradictory (and much of the information regarding finding investors seems sketchily focused on “luring” them in, rather than rightfully gaining their trust.)

For all the abundance — and diversity — of advice for start ups, I have not yet been able to find anything on more practical, immediate problems, particularly facing start ups in Berlin or Germany more broadly.Assuming that I am already running a landing page to catch some early adopters, and giving them access to a functional prototype/alpha for special conditions to test out and develop my product, I still have questions that I’d organize into two main groups: 1. Pre-incorporation logistics and 2. Legal and tax details of the founding process.

1. Pre-incorporation logistics:

At what point should I legally found my company? Only once I am certain that my product and business model will work out? Or should I incorporate before I handle any data from businesses, or from private customers?
If I were to collect data before legally founding my company, how can I legally transfer the prototype and the data to the company?
In Germany, having an imprint on your website and following data protection laws is very important. Failure to correctly comply can easily lead to notices from lawyers paid by competitors - without a limited liability company, how can I protect myself?
How do I handle expenses I incur before officially founding my company? What should I do about them if my plans do not materialize before legally founding? Are there any tax breaks that I can claim from pre-founding investments or expenditures?
Can I "hire" any help in advertising/marketing before founding a company?

Assuming I decide to start a German UG company (basically a limited, with some more drawbacks), this leads to another group of questions.

2. Legal and tax details:

How much collateral beyond the first necessary Euro do I need in reality? Does that have any practical impact on how my company can interact with other companies?
What other costs can I avoid? I have to pre-pay taxes, but how should I go about making good estimate of what to pay? Should I try to over or underestimate? I can avoid the IHK membership fee at the beginning?
How should I handle expenses I privately financed before founding, e.g. hosting a landing page, some marketing costs, money spent to research the market and buy publications?
How much money should I put into the company to keep it solvent, but not have more unused liquidity in it than my personal bank account can't handle? How about adding more money later on, especially when there are multiple founders? How to balance stake vs money put in?
In case I suspect that my company will be bought at some point, should I have a holding to hold the actual company, e.g. to avoid the early payment of taxes?
If I plan to have investors, do I need to take any preparations to easily enable parts of the company owned by third parties?
Should I use the "Musterprotokoll" when founding my company or should I have a custom one made? Should I use a service provider like go-ahead or firma.de to found the company, or do everything on my own?
Should I found have a German company, or an English, American, etc. one? Especially in software and SaaS, being some other place may be easier tax- and data protection-wise.

I am aware that a lot of these questions depend on what my specific enterprise will do and I will need to talk to someone professional - but who should I talk to?

Dienstag, 27. August 2013

A quick overview of simple encryption for end-users

Recent revelations regarding PRISM and other US (as well as UK and EU) government agencies systematically collecting and processessing private information has sparked many conversations about encryption across socialnetworks. So, here is a quick overview of a few free, (mostly) open source programs that enable everyone to use high quality encryption. The encryption schemes used by the programs listed below are considered uncrackable and require, using the currently available computing power, billions of years to crack.

To encrypt your hard drive and/or private data files, I highly recommend to use TrueCrypt. It can encrypt your entire hard drive, including your operating system. Or, it can create a virtual disk that when you enable and look at it in your explorer will look and function just like a USB drive -- only, everthing you put onto the disk will be still on your harddrive, but now stored in an encypted file. Option #1, The Encrypt Everything Solution requires you to provide a password when you boot up your computer. The TrueCrypt manual provides more information on how to set up encrypted devices or fine tune some of the parameters available.

It is also possible to use truecrypt on android phones with EDS, and if the phone is rooted, it is possible to share encrypted files via Dropbox between smartphones and other devices.

To encrypt emails, as well as single files, public key cryptography is the go-to choice. This works by having every users generate two keys, a public one that anyone can know, and a private one that must stay secret. The public key can be used to create encrypted messages for the owner of the key (think of this like putting a lock onto the message) -- it can only be decrypted using the private key (the key to the lock that you put on the message). The private key can be used to create signatures of documents. This enables third parties to reliably check whether a text was authored, and is unmodified, by the owner of the private key.

Popular software that integrates this form of encryption into your email program in the openPGP standard are

gpgtools for mac, obtainable at https://gpgtools.org/
gpg4win for windows, obtainable at http://www.gpg4win.org
k-9 mail for android, obtainable at https://play.google.com/store/apps/details?id=com.fsck.k9

These programs come with an extensive, 190 page manual, although using the program is really simple and easy due to the plugins for Thunderbird and Apple's mail program.

To encrypt chat sessions, I highly recommend using OTR. It is a plugin for pidgin (windows, linux) and adium (mac), an opensource chat client that supports a wide range of chat protocols, e.g. google talk, msn, facebook, xmpp, icq, and many more. It is also available in gibberbot for Android. Using the skype4pidgin plugin in pidgin or adium, it is possible to encrypt the text chat of skype, too. Unfortunately, it does not help to secure voice communication. There are some guides, including video tutorials for set up.

(Another interesting feature: the plugin not only offers encryption, but also enables plausibly deniablity. If used correctly, by just looking at the encrypted stream of exchanged messages, it is not possible to attribute authorship to the messages.)

For push notification based instant messaging on smartphones, I recommend Threema. The system I described above (OTR) requires users to be online to send messages - but whatsapp and others do not require both parties to be online at the same time. Threema avoids this issue by using a scheme similar to the public key encryption in emails.

Unfortunately, it costs a couple of bucks (~1.8 €), but has no further charges. It basically is an encrypted version of whatsapp. The only drawback is, of course, no intercommunication with other messengers. It is also closed source, meaning it is hard to check what the app actually does, but the cryptography is provided by an open source library NaCl.

Surfing anonymously is a bit harder, and there is a choice of VPN providers. They all are centralized, i.e. there is one party that knows everything you do. There are a few peer to peer solutions, and the best and probably most famous on is the TOR project.
The Tor project offers a a lot and is actually used in China, Egypt and other Arabic countries to enable and protect members of the opposition. There is an amazing talk "How governments have tried to block Tor" by one of the creators, describing the arms race with China and other countries on en-/disabling access to the TOR network - once a user is in, it is almost impossible to track his activities.

Dienstag, 30. Juli 2013

The Multi-linear Jigsaw of "Candidate Indistinguishability Obfuscation" for functional software encryption (possibly part 1)

This evening I noticed a few retweets about a code obfuscation technique that was reported as being new and groundbreaking. I was intrigued and read a good article article, but was unsatisfied with the level of detail and the number of unanswered questions I was left with: "What is the concept of obfuscation they use; isn't there a proof of impossibility?" "Does it work for all programs?" "How secure, and how provable is this technique?" and "What does 'multilinear jigsaw puzzle' in detail mean?" as well es my curiosity regarding the 'how' of their construction. To answer these questions, I took a look at the paper itself and took some more detailed notes which you can find below. In short, they have a precise notion of security for their understanding of obfuscation as well as a progression of generalizing and securing their core idea.

The concept of obfuscating and its security:

The issue the paper contributes to is as follows: can one make a program, while preserving its functionality, make it unintelligible (so that looking at it doesn't give you a whole lot of information)? There is a result that roughly states that it is almost impossible to have an "encrypted simulation" of a program to obfuscate it, since it can be shown that one can construct efficient unobfuscater that extracts the information used to make the program unintelligible.
A solution to this dilemma is given by a slightly different formulation of this problem, called the indistinguishability obfuscation. Instead of resorting to simple models of computations such as Turing machines, it uses circuits. Given a particular class of circuits $\mathcal{C}$, then an indistinguishability obfuscator $i\mathcal{O}$ should guarantee that for two equivalent ciruits $C_1$ and $C_2$ the two distributions $i\mathcal{O}(C_1)$ and $i\mathcal{O}(C_2)$ are computationally indistinguishable. It has been an open question, whether such an obfuscator exsists for all polynomial sized circuits.
The guarantee of the obfuscator $i \mathcal{O}$ is formalized by security parameters,; the cirucuit class of the obfuscator is actually $\{ C_\lambda \}$. For encoded circuit equipped with a security parameter, e.g. $i\mathcal{O}(\lambda, C)$, there is for any distinguisher $D$ a negligible function $\alpha$ and it holds: for all obfuscations $C_1, C_2$ of the same circuit $C \in \mathcal{C}_\lambda$, the probability of $D$ distinguishing it is bounded by $\alpha(\lambda)$.

The computation model:

The paper proposes an indistinguishability obfuscator for circuits of the class $NC^1$, that is the class of all circuits that are polynomial in size, and have a logarithmic (specifically: $log^1 n$) depth; one can think of a polynomial number of parallel processors with each a logarithmic runtime. It is not known whether this class equals P, but it can be shown that $NC \subset P$ by sequentializing the computations. For another idea about the class: $AC = NC$, where $AC^i$ contains all boolean circuits with depth $O(log^i n)$ and polynomial number of AND/OR gates that admit an arbitrary number of inputs.
The neat feature of $NC^1$ used for the later construction is giving by Barrington's theorem: Every circuit $C$ of $NC^1$ can be transformed into a collection of $k$ square matrices $M^0_1, \ldots M^0_k$ and $M^1_1, \ldots M^1_k$ and model the computation of the circuit, given input $x$ of length $\ell$ by a matrix product $C(x) = 0 \Longleftrightarrow \prod_{i=1}^k M_i^{x f(i)} = I$ with a pre-defined $f: [k] \rightarrow [\ell ]$.

The core construction:

Besides the need for additional security, the matrix computation will be encoded into $k$ groups $G_i$: if an entry of a matrix $M^0_i$ is $\alpha$, e.g. at $(1,1)$, it is encoded as $g_i^\alpha$. Using a multilinear map, the product can be represented in a single group $G_T$.

The ominous multi-linear jigsaw puzzle is described as follows: given a mutilinear map system over groups of prime order p $e: G_1 \times \ldots \times G_k \rightarrow G_T$, a valid multinear form is given by any expression involving operations in $G_T$, the multilinear form and the operations within each group for the parameters of the multilinear map. The papers gives a sample expression for the case of $k = 3$ as $w_1 \cdot e(x_, y_3, z_1 z_2)^2 \cdot e(x_2^3 x_4, y_1^2, z_2 z_5^3) \cdot w^2_3)$, where $x_i \in G_1$, $y_i \in G_2$ and so an, $w_i \in G_T$.

Group elements $g$ then can be viewed as puzzle pieces for the puzzle given by such an expression. A solution is given when the expression yields a unit $g_T \in G_$. The puzzle thus has a generator outputting system parameters $prms$ and some nonempty sets of elements $S_i = \{ x_1^{(i)}, \ldots \} \subset G_i$ and the validator inputs $(prms, S_1, \ldots, S_k, S_T, \prod)$ where the last tuple element is a valid multilinear form. It gives a positiv answer if the form $\prod$ with the given elements gives a unit element of $S_T$.

In order to prove security, the paper makes hardness assumptions of the form: for $g_1, \ldots, g_T$ generators for $G_I$ such that for a form $e$ with $e(g_1, \ldots) = g_T$ it is impossible to distinguish tuples of expotential products of the generators.
The obfuscation scheme:
This construction then is used in a Witness encryption like scheme:

We observe here an analogy to witness indistinguishable proofs: if a statement being proven only has a unique witness, then a witness-indistinguishable proof does not need to hide the witness. The way witness indistinguishability can be used is by explicitly constructing statements that can have multiple witnesses. Similarly, we will use indistinguishability obfuscation by constructing circuits that inherently have multiple equivalent forms. We use this analogy to build our main application of functional encryption.

This is used as

Given an $NP$ Language $L$, a witness encryption scheme for $L$ is an encryption scheme that takes as input an instance $x$ and a message bit $b$, and outputs a ciphertext $c$. If $x \in L$ and $w$ is a valid witness for $x$, then a decryptor can use $w$ to decrypt $c$ and recover $b$. However, if $x \not \in L$, then an encryption of 0 should be computationally indistinguishable from an encryption of 1.

The Indistinguishablity Obfuscation for $NC^1$ implies Witness encryption for an $NP$-Complete language. The takes proceeds to argue for $L=SATISFIABILITY$, using the function $F_{x,b}(w)$ defined by:

$F_{x,b}(w) = b$ if w is a valid witness for x
$F_{x,b}(w) = \perp$

yields a witness scheme because for $F_{x,b}(w)$ and $SATISFIABILITY$, $F_{x,b}(w)$ is in $NC^1$.
The remainder:
This finishes up this post, being possibly part 1, since the paper continues:

to solve the open problem by lifting the $NC^1$ construction to obtain a construction for all polynomial sized circuits
provide functional encryption schemes for those circuits
obtains a "meaningful" simulation based security for function encryption
provides security proofs against several classes of attacks

But I figure that if someone wants to know these details too, they'd have already read the paper before arriving here -- while my questions along the lines of "What are they doing?", "Does it work for all programs?", "How secure, and how provable is this?" are answered sufficiently for now.
The actual paper is at http://eprint.iacr.org/2013/451

Montag, 29. Juli 2013

Categorial constructions (Part 1)

Reading up on a paper, I discovered that I will need Day's convolution to construct tensors in a bicategory of monoidal supported profunctors. The hom-categories of these to $Set$ then capture supported pre-categories as a partial monoid.

So slowly working toward it, I flesh out some of my notes here:

Usually, one encounters convolution in electrical engineering and in image processing; the convolution of two functions $f$ and $g$, denoted by the convolution operator $*$ as $(f * g)$ is defined as

\[ (f*g)(t) = \int_{-\infty}^{\infty} f(\tau) f(t - \tau) d \tau \]

where $f, g: M \rightarrow \mathbb{C}$ are maps from a group to complex numbers. It can be thought of as a moving window, giving by one of the function, that inspects the values of the other function. This definition can be extended to convolutions on functors $F$ on a monoidal category $M$ to the category of sets $Set$, $F: M \rightarrow Set$.

In the standard setting, convolution yields a commutative algebra without identity on the linear space of (suitably) measureable functions. For the extension, the functor category $Set^M$ of functors $F: M \rightarrow Set$ yields a monoidal category; Day's convolution is its tensor product.

A simple example to see how it works is given by graded sets in a categorical setting: given the discrete category $\mathbb{N}$ with natural numbers as objects and only identify mappings as morphisms, where addition plays the role of the monoidal product, the graded sets then can be represented by functors $F: \mathbb{N} \rightarrow Set$. The convolution in the category then is

\[ (F*G)(n) = \sum_{i+j=n} F(i) \times G(j) = \sum_{i,j} F(i) \times G(j) \times hom_{\mathbb{N}(n, i \otimes j) \]

Where $\otimes$ is the addition of natural numbers. Moving on to a more general setting, one wants to replace $\mathbb{N}$ by a general monoidal category $C$. The formula for the convolution, given two presheaves $F, G: C^{op} \rightarrow Set$ is defined as

$ (F*G)(e) = \int^{c,d \in Obj(C)} F(c) \times G(d) \times hom_C(e, c \otimes d)$

In that expression, the convolution product is given by a coend $\int$ of the functor under the integral symbol. The coend of an functor $F$ in $Set$ is an object $e$ with a universal dinatural transformation $\zeta: F \stackrel{..}{\rightarrow} e$. In this case, the notion of dinatural transformation relaxes the requirements of a natural transformation: usually, for a natural transformation $\alpha: F \rightarrow G$ between functors F and G, $\alpha$ depends on some variable $x$ co-/contravariantly. A dinatural, or in this case the slightly more restrictive notion of an extranatural transformation requires either F or G to depend on some variable both co- and contravariantly. It is constituted by a collection of morphisms $\alpha_c: F(c,c) \rightarrow G(c,c)$ such that for every morphism $f: c \rightarrow c'$ in $C$ the hexagon identity holds:

$G(c,f)\alpha_cF(f,c) = G(f,c')\alpha_{c'} F(c',f):F(c',c) \rightarrow G(c,c')$

In this setting, the coend object is an object of $Set$. The Day's convolution has some nice properties regarding the preservation of colimits, as well as some interesting relations to the Yoneda embedding. The next posts will contain more on the properties on the convolution and possibly the Yoneda lemma, but also mainly focus on recovering supported precategories in a more abstract setting.

Samstag, 27. Juli 2013

Mahout

A nice intro slideset to Mahout, a mapreduce enabled hadoop machine learning library in java:

Introduction to Mahout and Machine Learning from Varad Meru

Montag, 29. April 2013

Writing Systems; Miss Korea and facial analysis

In a blog post, the relationship between the complexity of writing systems and its associated culture is discussed. While I do not agree with some arguments, it opens an interesting perspective.

On another note, there is an interesting (mathematical) analysis of the top 20 Miss Korea contestants and the similarity of their faces.

Montag, 22. April 2013

Auctioneers and High Performance

There is an interesting article/blog post on the architecture of the LMAX exchange. They prefer a single thread model with event sourcing over an actor driven model. Betfair, a company related to the LMAX owners, at least for a while, used an actor, in memory model for sports betting. These issues are only relevant when handling very high volumes of trades, but certainly might become increasingly interesting for exchanges handling crypto currencies like bitcoin, litecoin or ppcoin.

Samstag, 6. April 2013

Quick permutations

itertools for python are fun:

import itertools

partb= list(itertools.imap(lambda x: "".join(x), itertools.combinations('aApe',3)))
parta = list(itertools.imap(lambda x: "".join(x), itertools.permutations('gRrAet!1',5)))

print '\n'.join(list(itertools.imap(lambda x: "".join(x), (itertools.product(parta, partb)))))

But in the end, this post is just to test out the syntax highlighting feature in javascript.