Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
507 views
in Technique[技术] by (71.8m points)

machine learning - How does Apple find dates, times and addresses in emails?

In the iOS email client, when an email contains a date, time or location, the text becomes a hyperlink and it is possible to create an appointment or look at a map simply by tapping the link. It not only works for emails in English, but in other languages also. I love this feature and would like to understand how they do it.

The naive way to do this would be to have many regular expressions and run them all. However I this is not going to scale very well and will work for only a specific language or date format, etc. I think that Apple must be using some concept of machine learning to extract entities (8:00PM, 8PM, 8:00, 0800, 20:00, 20h, 20h00, 2000 etc.).

Any idea how Apple is able to extract entities so quickly in its email client? What machine learning algorithm would you to apply accomplish such task?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

They likely use Information Extraction techniques for this.

Here is a demo of Stanford's SUTime tool:

http://nlp.stanford.edu:8080/sutime/process

You would extract attributes about n-grams (consecutive words) in a document:

  • numberOfLetters
  • numberOfSymbols
  • length
  • previousWord
  • nextWord
  • nextWordNumberOfSymbols
    ...

And then use a classification algorithm, and feed it positive and negative examples:

Observation  nLetters  nSymbols  length  prevWord  nextWord isPartOfDate  
"Feb."       3         1         4       "Wed"     "29th"   TRUE  
"DEC"        3         0         3       "company" "went"   FALSE  
...

You might get away with 50 examples of each, but the more the merrier. Then, the algorithm learns based on those examples, and can apply to future examples that it hasn't seen before.

It might learn rules such as

  • if previous word is only characters and maybe periods...
  • and current word is in "february", "mar.", "the" ...
  • and next word is in "twelfth", any_number ...
  • then is date

Here is a decent video by a Google engineer on the subject


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...